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ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 

By David G. Ivbndaol 
Magdalen College, Oxford 

1, Introduction and Summary. The importance of stochastic processes in 
relation to problems of population growth was pointed out by \V. Feller [1] 
in 1939. He considered among other examples the "birth-and-death” process 
in which the expected birth and death rates (per head of population per unit of 
time) were constants, Ao and no, say. In this paper I shall give the complete 
solution of the equations governing the generalised birth-and-death process 
in which the birth and death rates A(f) and n(t) may bo any specified functions 
of the time £. The mathematical method employed starts from M. S. Bartlett's 
idea of replacing the differential-difference equations for the distribution of the 
population size by a partial differential equation for its generating function. For 
an account of this technique, 1 reference may be made to Bartlett’s North Caro¬ 
lina lectures [2]. 

The formulae obtained lead to an expression for the probability of the ultimate 
extinction of the population, and to the necessary and sufficient condition fora 
birth-and-death process to be of “transient” type. For transient processes 
the distribution of the cumulative population is also considered, but here in 
general it is not found possible lo do more than evaluate its mean and variance 
as functions of l, although a complete solution (including the determination of 
the asymptotic form of the distribution as l tends to infinity) is obtained for the 
simple process in which the birth and death rates are independent of the time. 

It is shown that a birth-and-death process can be constructed to give an 
expected population size h, which is any desired function of the time t, and among 
the many possible solutions the unique one is determined which makes the 
fluctuation, Var(n ( )> a minimum for all t 

The general theory is illustrated with reference to two examples. The first 
of these is the (A 0 , nil) process introduced by N. Alley [3] in his study of the 
cascade showers associated with, cosmic radiation; hero the birth rate is constant 
and tho death ralo is a constant multiple of the "age”, /, of the process. The 
n i-curve is then Gaussian in form, and the process is always of transient type, 

The second example is provided by the family of "periodic” processes, in 
which the birth and death rates are periodic functions of the time L Those 
appear well adapted to describe the response of population growth (or epidemic 
spread) to the influence of the seasons, 

2. The formulation and solution of the equations for the general (A, n) process. 

Let the integei'-valued time-dependent random variable «.< measure at time t the 

1 It appears from some remarks by Alley and Borchsenius [5] that the generating func¬ 
tion method was first employed in problems of this kind by Dr. C. Palm. 

1 
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size of a population, and suppose that in an of time di the only possible 

transitions (and their associated probabilities) ate: 

Ht+dt = ftf 4“ 1, o(dt); 

(1) n Hil - »i., 1 - {\(0 + n{l))n4t + o{dt)\ 

Wf+ji = W-* — lj n(,t)ftidi -f- o(d(). 

As an initial condition it will be supposed that the population is descended from 
a single “ancestor”, so that no = 1, and thus 

(2) Pi(0) = 1, P.(0) = 0 (ft. * 1). 

It then follows that the P n {i ) must satisfy the differential-difference equations 

(3) | P.(t) - (» + 1 )mP« fi(0 + (n - l)XP B -i(i) - n(X + jt)P.(0, » > 1, 

and 

(4) ^ p 0 (0 =* pPi(0 

(where for convenience of writing I have ceased to indicate explicitly the de¬ 
pendence of X and a on the time). If P„(f) is defined to be zero when n < 0, 
the first of the above equations will then be true for all ft, and accordingly the 
generating function 

( 5 ) *(*, <) - Z P.(0«* 

n—oo 

must satisfy the linear partial differential equation 

m % - c - w. - Jr, 

the problem is to find the solution to this equation when it is coupled with the 
boundary condition <p(z, 0) = z. 

The equation (6) is of Lagrange’s type, and can be solved in the usual maimer, 
the auxiliary equation is 


(7) 


| - a + (X + a)z - Xs a , 


and whiie in particular examples it might be convenient to attack this aquatic 
y, progress in general is more easily made by observing that (7) is i 

mZrlv ofT’E r t ?• Benera * theory is avaiIabIe * The fundament 
property of a Ricc ati equation is that the general solution is a homograph 

’See, for example, G. N, Watson [4J, pp. 93-94, 



ON THE GENERALIZED “BIRTH-AND-DEATH” PROCESS 


3 


function of the constant of integration, so that 

„ _ /i + Of i 

U + Cf<’ 


and equally 


O - ~ /* 

h ~ zfi’ 


where fi , / 2 , /a and /< are all functions of the time (. Thus the general solution 
of (6) is of the form 

v(z ’ l) ' 

and from the boundary condition <p(z, 0) = z it then follows that 


<p{z, t ) 


gi(t) + gga(f) 
ffs(t) + Zfiu(0 ’ 


On expansion, one obtains 


( 8 ) 


Po{t) = f, and P„(i) = (1 - fMOKl - n<) 


n—1 

Vi 


(n ^ 1), 

where £ ( and vt are functions of the time l. Thus, for the general (X, p) process, 
ike population size at any time is distributed in a geometric series with a modified 
zero term 

The next stage of the solution is to determine the functions and >u . From 

8 ), 


( 

(9) 




and if this expression for ip be substituted in (6) it will be found 1 that 
W ~ W) + v' = X(1 - 0(1 — -r?), 
and 

r = m( 1 - f)(l - v). 

Now let U = 1 — £ and 7 = 1 — rt, so that 

U'/U = - uV, 

and 

F = (g - X)7 - gV\ 

The last equation is of Bernoulli’s type and can be solved by writing 

W = 1/7, 

•Here {' a df/dt, etc. 
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so that 

r + o» - x)ir = n. 

Initially £ = g = 0, and £/ = F = IT = 1; the solution of the UVquation . 
therefore 

(10a) If = e“ p {l + | n(r)drj, 

where the function p is defined by 

(11) pit) = J |m(t) - X(rljdT. 

Integration by parts gives two other formulae for IF which will prove useful 
they are 

(10b) W = 1 + f [ c' lt \(r)dT, 

Jo 

and 

(10c) W = K1 + O + ¥~ p [' e fl,) {\(r) 4- m(t) |rfr. 

The quantities U and F, and hence also £ and i] can nmv lie expressed in terms of 
p and W, for 

U' _ v _ p _ W , 

V ^ W ~ W p ’ ■ 

and so 

(12) ■ h = 1 - ~ and Vl = 1 - i, 

These results, together with (8), suffice to determine completely the I’M) as 
functions of the time t. 

It is easy to deduce tormulae for the mean and variance of n, (these could also 
be obtained directly from (6)), For the mean, 

(13) n t = Lli* = <f'<», 

l - iji 

while for the variance, 

Var (a,) = = e ~'( 2 W - 1 - e~') 

(14c) n ' 

= e p l c pW (X(t) + p(r)}dT. 

•'O 
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Alternatively, using the other forms for IF, one can write 


(14a) 

Var (n () = e p < 

jV p - 1 +2e~ p j['e ptT, M(r)drj 

(14b) 


1 - e" p + 2c~ p J‘ c Mt) X(t )dr j 


If the initial population n a ~ N > 1, these formulae for n t and Var(n,) are to 
be multiplied by N. 

It is now a simple matter to apply these formulae to the Arley (Xo, miO proc* 
ess. It will be found that 

p = -£mi t“ — Xr>l. 
and 

IF = 1 4- x oe -^ <I + x »‘ f e^-^dr. 

Jo 

The mean growth of the process therefore follows the Gaussian law 

fk = e u ‘~ Wi , 

while for the variance (using (146), since X is a constant) one finds 

Var Ou) =n<(l - fk) 4- 2 X Q n) [' e^ r, ~ hr dr, 

Jo 

in agreement with Arley [3] and Bartlett [2], The distribution of n t at time , 
follows on inserting the above values of p and IF into (8) and (12). 

3. The chances of extinction. The simplest special case is that in which 
(X, m) have the constant values (Xo, mo) ; this is the process introduced by Feller 
[1] and later discussed by several writers.* The formulae (13) and (14c) give 
at once the results 

(15) fk = and Var (w f ) = - — fkifit — 1), 

Xo — Mo 

due to Filler, while since 

W — Mo 

Xo — Mo 

equations (8) and (12) give 

(16) Po(t) - and PM) - U - -PoCOKl “ w)*?" 1 (» £ 1), 

_AoRi — Mo 

4 See Arley [3], Arley and Borohsenius [5], Bartlett [2j and Kendall [ 6 ]. Palm’s formulae 
(10) are stated without proof by Arley and Borohsenius, but it appears from their remarks 
that he used a generating function method probably identical with that later employed by 
Bartlett and myself. 
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where 


vt = -Po(t) 

Ho 


Xp(fli ~ 1) 

~ Mo 


These formulae were first given by C. Palm. 5 They actually hold only if 
X 0 no ; in the case of equality, W = 1 + Xgf, and then 

fh ~ 1, Var (n ( ) = 2 X«f, 

(17) Po(0 = rrb aQd p " (i) " !l - ftwiu ~ ^r 1 (»% d, 

where iu = Po(f). 

One particularly interesting point is that 


PoG) 1 as t —> co if Xo ^ Mo, 


so that the population is “almost certain” to die out, even though in (he critical 
case (Xo = mo) the expected population size fit has a constant value. The same is 
true for any initial size of population; the new expression for P 0 (t) is then simply 
equal to the former one raised to the power no = N, and therefore tends to unity 
as before. This phenomenon of extinction was first noticed in a similar problem 8 
by Francis Galton and H. W. Watson; an account of their work is given in Ap¬ 
pendix F of Gabon’s book [7]. 

The formulae of the last section now make possible a discussion of the chances 
of extinction for the general (X, m) process. When no = 1, 


(18) 


Po(t) = 



and so the necessary and sufficient condition for the ultimate extinction of the popu¬ 
lation is that the integral 


(19) 


I = f 6' <t) m(t )dr 
Jn 


should be divergent. 

te Ltl,»rS thil ‘ f‘ T ! ? t “ d 01 <19) is ud so the in- 

t e tder di y er ® e t0 P lus infinity, or have a finite value. Hence in any 
case the population always has a definite chance of extinction , given by 1/(1 + I) 

by the P Son° n “ *• £« i £**d 


( 20 ) J t + (l-{- y)z Y 

---l 1 — 7J0 } ’ 

' The extinction of family-names Further references will be found in my paper [8). 
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so that 


Pott) = 

and the chance of ultimate extinction is 


( 21 ) 



which is or is not equal to unity for all N indifferently. 

Extinction is impossible, in the sense of being an event of zero probability, if 
and only if a is identically zero, so that the process is one of reproduction only. 
It is also worth noting that a necessary but not sufficient condition for almost 
certain extinction is the divergence of the integral 

(22) [ fi(r)dr. 


For if (22) had a finite value, p(i) would be bounded for all l, and so (19) could 
not be divergent. In general, when I = « and the population is almost cer¬ 
tainly doomed to extinction, I shall speak of the process as transient. 

For a transient process it is of interest to consider the random variable T, 
defined to be the "age” of the process at the moment of extinction. Hincc. 


Pott) = Probability [T < <), 
the probability distribution of T is Pg(7')d2', or 


(23) 


e> lT) n(T)dT 


1 + 


j e MT) a(r)dr| ’ 


0 < T < ao. 


For example, in the simplest birth-and-death process, when X and p are equal 
constants, the distribution of T is 


(24) 


x Q dr 

(i + W 


0 < T < oo. 


This is for an initial population no = 1; more generally, when no 
distribution of T is 


NP',(r){P,(r)} N ^dT, 

The median life-lime T m is determined by the relation 
(25) S M n(r)dT = 1. 

Jo 


N > 1, the 


For the simple process, T m = 1/Xo when X 0 = po, and more generally 


(26) 



(Xo 5* Po) 
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if n 0 = 1. When re 0 = N > 1, the formula for T a becomes 
(27) (>#■ VP»“ - 1) ~ f|- 2 . 


For the balanced process (Xo, Xo) it therefore follows that 
(28) T n (N) = T m ( 1)/(2 W * - l) - 1.44 N TMK 


as N tends to infinity. If the process is unbalanced, however, so that X« < 
this asymptotic proportionality to N does not hold, and instead 


(29) 


T m 


1 

Mo — Xq 



2 ,,w m« - x.\ 

(21'" - W 


log N 
Mo — 1 Xo’ 


as N tends to infinity. 


4. The cumulative population. There is associated with a birth-and-death 
process another random variable, M t , which is of importance in some applica¬ 
tions. This is defined as follows: initially Mo = n 0 , while for t > 0, M t shares 
all the positive jumps of n t . 

For example, if n t represents the number of cases of a disease in a population 
at time t, Mt will be the total number of cases which have been recorded up to 
that time. If the process is transient, so that the epidemic is almost cerlainly 
extinguished in the course of time, M m will then he a measure of its overall 
severity. 

Again, if n t represents the viable count of a population of bacteria 1 * with a birth 
rate X(f) and a death rate n(t), Mt will be equal to the total count in which living 
and dead organisms are not distinguished. 

In order to discuss the joint variation of n t and M , it is necessary to introduce 
the new generating function 

( 3 °) Mz, w, t) = £ f; P niU (ty w u . 

n-0 Af-fl 

Here the P n ,x{t) give the joint frequency-distribution of ft, and M, at time t. 

By the usual argument the differential equation satisfied by the function d- 
will be found to be 


(31) 


~ - [Km 1 - (X + g)z + M ) , 


and the associated boundary condition (if initially n 0 = A/q = 1) is 
( 32 ) ■ <p(z, w, 0) = zw. 

when^and t0 *? this 11 e f ation for S eneral x (<) and M (0; the solution 
wh^ X and M are constants will be given in the next section. It is however 

processes * reIation to baoterial 
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possible to Unci general expressions for the mean and variance of M i; for this 
purpose it is more convenient 7 to work with the cumulant-generating function 

(33) K(u, v, t) = log \Ke“, e*, 0- 
This satisfies the differential equation 

(34) 


f - -« - -a - m fu- 


and of course 

K = ufh + vM t 4- Var in ( ) 

(35) 

+ }v‘ Var (M t ) + uv Cov (n t , M,) + • > ■ . 

Expanding both sides of the equation in powers of u and v, and equating coeffi¬ 
cients, one obtains the differential equations 


(36) 

(37) 

(38) 

(39) 
and 

(40) 


dl 


a, = (x - p)%, 


- Var (n t ) = (X + abh + 2(X - p) Var (n<), 
dl = Xn ‘ ’ 


dl 


Var (Mi) = Mt -j- 2X Cov (n t , M,), 


j Cov Cut, Mi ) = hit 4- X Var (n t ) 4- (X — y) Cov (n t , Mt). 


The solutions to the first two equations have of course already been given in 
section 2; from the third it follows that the mean value of Mt is 


(41) 


f 

Jo 


Mt = 1 4- / e~ pM \(r)dr. 


The solution of the fifth equation is 


( , Mt ) — £ |l + 


Var (ri y) 

fir 


X(r)dr, 


(42) Cov (n t , 
and so the variance of M t is 

(43) Var (M«) = [‘ { n, + 2 Cov (n T> M r )\\(r)di 

Jo 


7 Compare Bartlett [ 2 ], 
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In illustration of these formulae, consider first the Arley (X 0 , n\0 process; from 
(41) 

(44) Mt = 1 + Xo [ dr, 

‘'O 

but the complete expression for Var (Mt) will be a muliiplc integral which does 
not appear to admit of much simplification. 

For the simple (Xo, mo) process, however, when Xo < Mo , it readily follows that 


(45) M t = <5—, 

Mo — Xq 

(46) Cov (n,, M t ) = U m l- !0±h (1 - ft t )l, 

Mo — Xo ( Mo ~~ Xo J 

and 


(47) Var (M t ) 


Xo(mo + Xo) 


(Mo 

Thus in the limit, as t — 


Xo ) 5 


(i - a t ) 


4Xp no ta t i Xo(po 4~ Xol ... 
(MO - Xo) 5 r (mo - Xo) 3 U 


>ih. 


oo, the mean ^nd variance of M m are 




rv 


(48) 


Mo — Xo’ 


and Var (M.) = , 

(mo — Xo) a 


the covariance of course tending to zero. If the process is balanced, so that 
Xo = mo and n, = 1, the integral for M, has the value 1 + X 0 f, which increases 
without limit as t tends to infinity. This will always be so for a balanced process 
if the integral 


f X(r)d7 

vD 


is divergent. 

Jf the initial population n 0 is equal to A > 1, and if all its members are counted 

° ?° ’ modification necessary to the above formulae is that in each 

case the right-hand side is to be multiplied by N , 

tan S ?M S^fa d * 3 „ trib ' lti0n “>• population lot a atapl. 

f w°' ess '., hc (3I) ’ whil,l > warn in tha 

f- , . . * be ln ^actable even if one only requires the asymptotic distribu- 

d ? r T e l by ^ 1 ’ W ’ caa be solved completely in the specially mmole 

Z T “ and “ rateS Mi) and ** *" * 

Let a and 0 be the roots of the quadratic 
(49) W - (X 0 + mo )z + mo = 0, 
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so chosen that 0 < a < 1 < /l; then the general solution of (31) will be found by 
the usual method to be 

Z — CL 

r=-*‘ r 


xP = ¥ 


The boundary condition p(z, w, 0) = zw therefore gives 


(50) 


\p = w 


a( fl - z) + (3(z - 
, (/3 — z) + (z — a )e-X()i»w-o)i J ’ 


and it may be noted that if n 0 = j¥o — N > l, this formula for ^ would have to 
be raised to the 2Vth power. It will suffice, however, to discuss the simplest 
case when n 0 = M 0 = 1. 

Let the process be transient, so that X 0 < no ; then the asymptotic frequency 
distribution of M t when / —* co is determined by the generating function 


(51) 


^(1, iu, co) = lua = 


Xo 4~ Mo ~ \/ ! (Xo H~ Mo)~ ~~ 4Xq mow! 
2Xo 


and here it is the positive square root which must be taken. The probability 
distribution of i¥« is thus 


(52) 

where 


n _ Xo -f~ mo (2¥)! x u 

2 X 0 2 2 V ( M !) 5 2 M -~1 ’ 


(M - 1,2,3, •>.), 


(53) 

The first few terms are 


_4X_oMo_ 
(Xo + Mo) 2 


(54) rf-lUax’AV"), 

and it is easy to verify that the mean and variance of this distribution agree 
with the values given m the last section. When Xo = mo , x = 1, and then the 
terms m (54) fall off to zero like AT 3/2 , M x being infinite (in accordance with the 
remarks at the end of section 4). 


6. The determination of the process when its mean growth, fl h is given. 

Since ?!< = it follows that 

(55) X(0 - mW = ^ log n ( , 

and thus if n t is required to be a given function of the time, the birth and death 
rates must be chosen in accordance with (55); the only other condition is that 
for all t, \(t) > 0 and m(£) 2: 0. 

Arley has pointed out that the simple process (X(f) = c, m(<) = 0) gives a 
smaller fluctuation, Yar (n*), than any other simple process with the same mean 
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growth, say (X 0 , go) where Xo - = «• This suggests tlml «m- *li«ul*l "•“'•dm 

the more general question: if fit is given for all t, for which choice t*J the t uwtWh 1 
X(t) and /*(t) will the fluctuation Var («/) be a minimum / 

Suppose then that the whole region f > 0 consists of three of intmv.d-. 
£? t , E% and Es , and that within an interval of the set h, , 

n, is a decreasing function if j = 1, 
n t is an increasing function if j = 2, 
and fit is a constant if j = 3. 

Then one can write 

Var (ni) = e“ 2 %K W J + ^ f /"Xtr ,),h 

+ e~ 2 %[-e fM } + 2c~ if f ( >r fiirldr 

+ <f 2 ' ( c' w {X(t) +M(r)idr, 

Here the terms involving X and g explicitly are all non-negative, and so Var 
(n ( ) will be a minimum for the (unique) choice of X and a which makes them all 
vanish, namely: 

in Ei , X(f) = 0 and g(t) =» — fit Jilt i 

(56) in E 2 , \(t) = fit /fi j and g(t) => 0; 
m E *, Xft) = g(fl = 0. 

However, when one is looking for a (X, g) process with a given ii, function, 
this minimum-iluctuation solution would frequently be an artificial one. For 
example, suppose it is required that fit shall be a Gaussian curve, reducing in 
unity when t = 0; then 

(57) n, . 

say, and X(t) - g(t) = X 0 - ad; the most natural solution is then flip Ariev nror- 
ess, ’ 1 


^(0 — Xo, m(0 = Hi t, 


It is of interest that a (X, h) process can 
follows a logistic law, 


be found for which the expected growth 


( 58 ) n t = 

According to (55) one must have 


_ a 

1 + (a — l)e~0< 


Ml) ~ Hit) = 


(« - 1)0 

+ (a - 1)'' 


(a > 1,0 > 0). 
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The minimum-fluctuation 
(5»j X(f) 




, fa ~ W , #l 

c »< + (a _ i) > "W 




' 0 , 


which K'ltmfips the relation 
all! i 




flirt might have been expected, since the Verhulst-Pearl-Kp^ ret 

timi Uvljich forms the deterministic lias is for the logistic laiy) i 8 erentla * ec l ua ' 


toll 



7. "Periodic” birth-and-death processes. As a further evo i < 
fhw.rj* it is worth considering the “periodic” processes for rvhLri geUe . ra } 
growth tii i« tt function of the time which repeats itself with H > • , x ^ ectec 

will then follow that p{l) and so also A(t) - p{l) have the period to 

must iw »ro whenever (is an integer multiple of £>. Tim w ’ „. e p ^' 

arc those in which X and a arc separately periodic, and then it 5 


(621 


ft m 


/in and Yar (a) *» /uir>jf e^’lAfr) -(- M (, 


)\dr 


whenever / - AA. far every positive integer it Thus, although the ejected 
value of n, repeats itself regularly, m practice this “periodicity" would be 1 
Bcured by the rapid increase, with increasing 1 , in the magnitude 0 f tho random 
fluctuations (as measured by Yar (a,)). Moreover, since 

jfVW)dr « fcj[V w g(T)d T , 


it is clear that the process is necessarily transient, there being unit nmV. 0 wr+ 
that ft, will ultimately be reduced to zero. prooaDUity 

Periodic birth-and-death processes are likely to be of importance in biology- 
it should be pointed out, however, that this type of process describes the stoehaa’ 
tie modification of a regular periodicity imposed on the model from outside and 
it is not to be confused with other stochastic models which themselves generate 
irregular (non-phaae-keeping) oscillations, The models discussed in this section 
are in fact suitable for the cpiantitative description of seasonal influences. 

Before going into further detail it is natural to specialise the model by assum¬ 
ing that the functions X and a are at most simply harmonic, ff ^ j and ^ 
there is to be no damping, one will then have ’ "” JJ 1 

( 63 ) n, « 


fa > 0), 
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where vw = 2n, and a and t are amplitude and phase constants, respectively. 
The functions X and n are now to be determined from the relation 

X — p = av cos v(t + c) , 

and this can he done in many ways. The minimum-fluctuation solution would 
here be artificial, and it is more natural to select two other solutions) 

(64) X = «>>{1 + COS v{t + e)}, M - av, 
and 

(65) X = av, p = av{l “ cos v(l + e)j, 

for further consideration. In the first of these the death rate is constant and 
the birth rate executes simple-harmonic oscillations, while in the second it is the 
birth rate which is constant, and the death rate which oscillates. Il can 1 h* seen 
that, of all solutions of these two types, (64) and (65) are those with the least 
value for Yar (nt). From formulae (14a) and (14!>) it will be found that, for 
either process, 

(66) Yar (n) = 4irfc«/o(a)e“ ,! ‘ , ‘ , ‘ when t — ko> 


where I 0 (a) is the Bessel function of zero order, of the first kind and of imaginary' 
argument. (It will be noticed that, whenever t is an integer multiple of u, the 
distribution of the population size n, is the same for the two models.) For small 
oscillations, when i = fra, 

(67) Var (n) ~ 4irfca as a —* 0 
since Io( 0 ) = 1 , while for large oscillations 

( 68 ) Yar (w) ~ 2k(2va)*/n wa as a —* °o, 

(Here ftmm is the minimum value of n<.) 

The calculation of Po(u) presents some points of interest. For either model 
it proves to be 


(69) 


2 ir<xl a (a)e a,u,t 
1 + 2 iralci(o!)e 0,io ' < ’ 


this is the probability that a population element, known to be descended from a 
single individual at time t = 0 , will have become extinct one year later (if one 
identifies the oscillations with a seasonal effect). It will be seen that /»,(«) 
will be least when sin vt = - 1 , and greatest when sin re =» -flj j,e. w lu*n 
w« is expected to have a minimum, or a maximum, at i = 0, respectively. Ac¬ 
cordingly it follows that the progeny of a new member of the population is moat 
likely to survive till the following year if the “ancestor” commences its “raem- 

mum vaiue * ^ ^ popi,lation wou!d normally have its mini- 
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PROBABILITY OF COINCIDENCE FOR TWO PERIODICALLY 
RECURRING EVENTS 1 

' By Paul I. Richards 

Brookhaven National Laboratory 

Summary. This paper contains a study of the following problem: Ka<*b 4 
two events recurs with definitely known period and duration, while tins Hart trig 
time of each event is unknown. It is desired that, before the elapse of a certain 
time, the events occur simultaneously and that this "overlap” la* of at lea.nl a 
given minimum duration. 

The probability of this satisfactory coincidence is first evaluated, and it it 
found that the solution, while mathematically adequate, is of no value fur prac¬ 
tical application. This circumstance arises from the possibility that, wit It 
certain rational ratios of the periods, the events may "lock in step”, Acronl- 
mgly, an attempt is made to smooth the probability function with twpeel Ut 
small variations in the ratio of the periods. Due to difficulties ill manipulating 
the number-theoretic expressions involved, this smoothing is carried through 
only by the use of certain approximations. Moreover, because of these Mime 
difficulties, an averaged value of the probability itself is not obtained, but, irr 
its stead, there is derived a formula for that fraction of randomly related rcpeidnl 
trials in which the original probability will be less than one-half. 

Thus, the original problem is not completely solved. The, results obtained, 
however, do allow one to compare the relative advantages of different situations 
and to make a rough estimate of the likelihood of success. Generally ajimkitig, 
the analysis is applicable whenever the ratio of "on time” to "off time" is small 
for each event. 


1 . Introduction. Our problem may be represented schematically as follows: 
Consider two pulse waves (Fig 1) of periods T x , T % , pulse widths l { , (,, and 
phases 4>i , <h- It is desired that these pulses overlap at least once within a given 
time interval; moreover, an overlap is not satisfactory unless its duration Is at 
east as great as some assigned t m . The starting phases fc and <h are unknown 
for toA waves. Our problem then, would appear to be to calculate as a function 
of time the probability of at least que overlap of duration at least f w , 

is ie 1 ‘! 8r . M ''"■!» mathematically ad«|u»«., 

Lw ”, P , ■ wteat,0D ' ™»omimmw to 
pp ed mathematics arises from sources generally kept in mind only by experb 

r„^ riJr ly ; th6 7e,y n r° ot — *» iiSE. 

-^^i tal -- y6 does at some sta S e > the use of the human senses, precludes 

for the accuracy of the statements contained herein. ’’ b L 5881111168 no readability 
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t!,r >6 nit<h» malm.dly r\;s<“ values of the parameters of the problem. 

1(J other W'-H'i. itbh-ei«h r qunmerUal <rmr rail sometimes be made amazingly 

email. b «an virr !i/«*!jmm,prd 

\‘«<w -I* »*>«?*’ b' « M*'' Jl "lri""» 'In* poenibilily that the waves may “lock in 
*'<n s>. rVrrroeK erratic with respect to very minute changes 

in the jx-ra*!'* T%, 1; J «*j rvt tuple, let T, Ti- low, « loot* (t, - 0); ft 
'•impl** «brcj» i a!* ul.Pnm »hni dmtti. that, fur all times greater than 'i\ = 'i\ , the 
♦h-ow-l pr- 4 dial ih*v i,* (10,1 N**’«v if we let i [ ■ 7 j f* i, one wave will “creep up" 

cm the other, and i ’.ejitiwlh • S»*r time* greater titan T{l\U) the probability is 
Jttun 1 Tins* st m - * 1 ' v«-jy well happen m a practical application that the param¬ 
eters air known ?•« an ,e*ur.M\ «**w*nf tally sufficient only to give the obvious 
rr*tih: M * f" «' 1 



j 



ft. u»v j*i m twol jnebJ'-Mi .-tadnuily tH«wadered, uncertainty in the data arose 
not «*nJy ffiAfn «f%prniweipal error but ohm from slight instability of equipment. 
*fhu> i*wum 4 at.«aj 83 i»« over variations in the period* had to be found 
rf rise umh h* u* he 4 any imwtical value whatsoever. 

F«r whnh %di nppsir in the later annlyffls, this smoothing entails 

di®nsl’sr* wtosvh the jmthor w unable to overtutmo with any great mtccnwj the 
fwtMret «j fhr nhwh Jmw l«*n obtained is di<wu««l in the noxt section, 

'{'to?** wniH* »nv4vr rnvral ajspmsinwtiww which, .generally apeftWog, are 
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We shall continue to use the notation already introduced: 
k , h = durations of the events; 

r l\, Ti = periods of the events, 

(1) 

L„, = minimum satisfactory duration of coincidence; and 
P ~ probability of at least one satisfactory coincidence. 
We shall also use the (at present) rather arbitrary notation: 
t — (time — l „,) 

(2) Po = (h - l m ){k ~ t m )/Ta\ 


w * (h + <1 — 21 m )/T\Ti. 

The probability function for short time intervals is: 

(3) P = Po + wt, for t < Max(?’i, Ti). 

In any case: 


(4) 


P < Po 4- wt. 


As already explained, the functional dependence of P for large t in of no prac 
tical use due to its extremely erratic variation with small changes in the ,>rriod« 
Tx, T}. 

For reasons which will later become apparent, the only type of avcniRing which 
has yet been carried to completion is the following. Consider that mum- trials 
ot equal length are made and that in each individual trial, till the parameter., 
are, by some mysterious device, held constant with absolute, mathematical 
exactitude^ Assume for definiteness that T 2 < r J\. Between different trials, 
let 4 and T, vary m such a way that TJT* takes all values within a range of 
T probability. (I n the original problem, the ratios t ( /r ( ttwmirilv 
remained constant) The quantity / given below then represents (hat fraction 

P.+« Ttatti e TT rrotaM “» 13 « «to»3 vie! 

Tt 1' r r j th f the § reate1 ' are the chances of success, 
must be admitted that this method assumes several tilings which arc not 
rue m practice. First, the parameters of the problem ZK tn* W 

m Tt whiia n 'ri ui * ** demand eis muoH euj 33% Yiuififitift 

«*-- —»* Jt 3 '‘ **■- r - —s 

practical problem XlhXfcy LX Tt* ' ,erC ” 10 “ 1 ”S ful Or the 
can be carried through ’ Seive Untl a more ^equate analysis 

The reader will that the fine. bav. the form of a -prebabillty „(» 
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pr.-bal.ilu>-" I? would U»jr twm that a simple integration would yield a true 
l*r.*n!nlrtv. but. iinfw*um<Hy, the formulas for / arc reasonably accurate only 
{«»r if * }. 114*’ tired formula for/ - fraction of trials in which )> <I\+Q is: 

* for lw < Q, 

i 'if b If ^ 1 + - 1^ log - 9^1 1 f or lw > Q t Q < 1/2 

Tina «-\prem<m i/« Hifywt to error from several sources. First it is an approxi- 
Ui a wiroVr theoretic formula given in (31); this approximation is best 
for t arid if v large compared to MaxfTi, 7 a). A completely general comparison 
<4 <31, ami ‘Tt t - m> m given in Fig, 2, where the agreement will be Been to be 
quite adequate even for relatively small t and Q/w. (The dotted contours are 
r!Might hurt* paving through the origin.) When l and Q/w are small this first 
-mtrr** >4 errer ran Iw* eliminated by using the solid contours of Fig. 2 in place 

of V*. 

txroMilh, formula '3tj itself is an approximation and involves the use of 
simplified pmiMWhty formulas ami an assumption that 1\ and w are constant 
Tt wntt'-i ’IV maximum possible magnitude of these errors in (31) is given 
bv parent hm-w indicate functional dependence); 

"l 1 Q - fh - f() < JUw, Q) < f(tw, Q -b pH- ?), 

tdiw, m 7’ s var »«'<*, 

«*, © ^ minimum, maximum values of to 

fh change in Pa 

q * maximum value of w i T/Tt. 

ticnendi.v speaking, llwwc errors are small if U/ r I\ are small and if t is large com¬ 
pared S«» Maxi 7’i, 7V>. Alsu, there is considerable possibility that certain errors 
will cancel in such a way as to mate (6) correct with q - 0. 

We shat! now outline the practical use of these results. Given nominal values 
of the parameters* defined in (I), chooao a convenient value for Q < $ (usually 
Q - it, and substitute into (2) to find Iw/Q. From (5), one may then determine 
/ - fraction of timln in which P < P o + Q. (Low values of/are thus desirable.) 
For computational convenience, (51 has been plotted in Fig. 3, while, above the 
range of Fig. 3, the following lies within 1% of (5). 

(7) /» 0.008«?/Ik>) l«r lw > 10 Q. 

Note also that (4) may often be of considerable use in quickly eliminating cases 
of very poor probability, and recall also that (3) will give the true, directly mean¬ 
ingful probability whenever f k no greater than Max(Ti, Tt). 

Evaluation of the maximum posable error Jn / as so obtained is more com¬ 
plicated. If t and Q/w are small, Fig. 2 may be used to eliminate inexactness 
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due to the approximation of (31) by (5) = (33). Otherwise, this error may 
safely be assumed to be negligible (less than 0.025; (31) may be employed di¬ 
rectly, but this is laborious unless Q/w is small). The remaining errors, given 
by (6), may change depending on how 1\ is assumed to vary. To make these 
bounds as close as possible, it is best to choose 1\ = Minf7'i. 7'*) and then let 



Fm. 2 Contours of f/Q - (31) ; - (33; 


jar ib nomm “' **» T , /T , to 

ho V* r iump “ ‘ m 

in tne r ^ u ;; 

? * *> and merely means that the “lock in” IIZ T *J " $ w 

have an effect when t becomes greater than qJw *** ^ abl ° t0 
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8. The probability function. Our problem has already been represented by 
the pulse naves of Fig. I. I he starting phases <j> ,, fa of the 'waves are random, 
and we dmrc the probability P of at least one overlap of duration at least l n 
within a given time interval Manifestly P » 0 until time L\ hence we shall 
give t the meaning already assigned in (2). 

CmiMder any sub-interval of width t». The range of phases favorable to 
Hatiftfoekiry eoinridencse on this interval is easily seen to be a rectangle with 
sides */i *- j f«J» (4 j “ t n > in the phase plane (fa , fa)' By proper choice of the 
(arbitraryj zero-phase reference, the small rectangle favorable to coincidence on 
<0, U ran be made to fall in the lower left corner of the phase plane Wig. 4). 



As we allow the sub-interval (width 4.) to advance in time, this small rectangle 
will sweep out along a *45° line (Fig. 4); its horizontal displacement = vert. disp. 
is given by l as defined in (2). Since the phases must be,measured modulo,the 
periods, we must “switch back” the strip whenever it begins to leave the .large 
rectanglei 0 <, fa < Ti , 0 < fa < 2s; this is illustrated in Fig. 6. 

The desired probability is then the area covered at least once by the strip 
divided by (7 'iTj), the total available area of the phase plane. . 

Using Fig, 4, one can easily show that, before the strip begins to overlap itself: 

(8) P - Po + wl, ';' . 

where i, 14, to are defined in (2). 

A rectangle with opposite sides identified, as in Fig. 5, is topologically equiva¬ 
lent to a torus, ■ This gives a good geometric picture of the overlap phenomena. 
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The Strip winds diagonally about the torus until eventually (in general #fte 
several full bitcuitsj it strikes sufficiently near its starting point to overlap itself 
on one <kge it then begins to fill the chinks between the previous circuits, and 
this single PVfe-rlap continues until the chinks are almost filled. The strip then 
approaches its starting point from the side opposite to that on which single 
overlap occurred. Thereafter, only the center section of the strip m effective lit 
increasing the area covered. This double overlap continues until the entire 
torus has been covered. A degenerate case is possible in which the strip, upon 
its first .overlap, begins to retrace exactly its former path and the torus is never 
fully covered This corresponds to interlocking of the original waves or Fig. 1. 

A rigorous proof of the above statements may be constructed by using the 
fact that epiQh change in behavior can occur only at the starting point. In this 
manner, jijjip easily shown that. (a) single and double overlap occur in that order. 



a 



(b) the strip area effective m covering changes only upon a change in the type of 
overlap, and (c) the two types of overlap must occur on opposite sides of the 
starting point. 

The facts (a, b, c) may then be used to derive the probability function. For 
the analytic analysis, it is best to return to the (fa , fa) plane. Overlap of any 
type will first occur when the “unswitched-back” strip approaches sufficiently 
near a point (fcTi , rfclj) where rh and n 2 are non-negative integers not both zero. 
The analysis is greatly shortened by noticing that the behavior is completely 
determined by the distance of the line fa = fa from such points (oven though 
the strip is not Centered on this line), while the width of the strip is (Fig. 4) 
wTiTi/V2- 

A slight fine-structure may arise in the probability function where it changes 
slope, depending on whether or not the leading corner of the moving rectangle 
strikes one pf the sides of the original small rectangle. These effects are small 
if h/Ti are small and will be neglected below by supposing the strip to be gen- 
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f nitffl by a Imr 1 M-gment (trienfed perpendicularly to its path. Tlie error arising 
from this premium consist* essentially in a delay or advance in the time at 
which V clmngw slope. It may be seen that the maximum effect represents a 
delay of At *- % rAA, 2. The error introduced is then less than Aty/2 multiplied 
by that pet lion of the total width of the strip which becomes ineffective due to 
the overlap c«nwidercd. The sum of these effects must be less than that given 
by using the total width of the strip; this gives the maximum error wTJ\/2. 

The result* of the method outlined are. then m follows. Single overlap occurs 
at l <"■ a where 

(9) « * bimiTi + ithTi), 

and (wi, «?j} is that pair of non-negative integers not botli zero such that s is a 
minimum and 


■ »n »i*i ^ 

<K»» P* » | ~~ - ~ < w. 

Double overlap occurs at L * d, where. 

an d-s( niTi + n,r,) t 


and t«i, Uj) is that pair of non-negative integers not both zero such that d is a 
minimum and the conditions 


(121 

are satisfied. 
(18) 
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the probability function is then 
®» /'o + tat 

(14) P * Pt + mo + (l- &)pi 

* Po 4- su* *b (d ~ s )vi 4" (t — d)pi 


for i £ s, 
for s < t < d, 
for d < l, 


where it is understood that P E - 1 if (14) gives P > 1. 

The degenerate case where the waves interlock is given correctly by this for¬ 
malism, Namely, if the strip starts to retrace it* path exactly, then pi « 0 
and the second part of (12) shows that d does not exist. Equation (14) then 
gives the correct result: P rises to the value P t + aw and never increases further. 


4. The method of smoothing, We have already discussed in section 1 the 
inadequacy of the formal mathematical solution (14) for purposes of practical 
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application Either mathematical analysis or intuitive consideration of inter¬ 
lock shows that the erratic behavior of P is due almost entirely to small changes 
in the ratio Pi/TV As this ratio passes through certain rational values, possi¬ 
bilities of interlock appear and disappear. Consequently, we next alter (14) 
to a form in which the dependence on this ratio is more evident. 

We may, without loss of generality, assume: 

(15) Pi = 1, T t < 1. 

Also introduce the standard notation: 

(16) [x] = (largest integer < x ). 

It will then be seen that (10) and (12) may be thrown into the form: 8 

(17) k = smallest positive integer such that pi = | kc — i \ <w (i = integer) ; 

(18) K = smallest positive integer such that | Ke — 1 | < w and also 

(Tee - t) (Kc - /) < 0 (I = integer), 

where either 



Now from (9) and (10), we note that s differs from m/I'i by at most wPiPj/2, 
while from (11) and (12), d differs from niPi by less than the same amount. 
Moreover, by the second half of (12), d is thereby made too small if s has been 
made too large and vice versa, Hence the use of these approximations in (14) 
will contribute an error certainly less than w l T\T%/2. Adding the error dis¬ 
cussed in section 3, the total introduced thus far cannot exceed v/TiTi. 

We thus use in the present notation s = k,d = K\ (13) and (14) then become: 


(20) 

Vi — Pi + [ Ke — I j — w 



(a) P = P 0 + wt, 

for t < k 

(21) 

(b) P = Po kw -(- (/ — k)p i , 

for k < l < K 


(c) P = Po + kw + ( K — k)pi + (t — K)pn, 

for K t 


where, as before, P - 1 if (21) gives a value greater than unity. Equations 
(17)-(21) are the formulation which will be used, with conditions (15), hence¬ 
forth. 

■ We wish now to smooth P with respect to variations in e. The number- 
theoretic requirement' (17) is extremely difficult to work with. For reasons of 
simplicity, then, we shall assume that e is the Only parameter which changes as 


MlrS® ! ha V VOrl th °. Ugh th f penods appear ex P Uoit *y only ^ (19) hereafter, all the 
SiTVT a ?. true 0nly for Tl = 1 CH* is evident if we recall that w hae the 
mansions of inverse time ) Thus we are definitely assuming that P, = constant. 
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7*2 in varied. 'Tho errors which may arise from (his assumption are treated at 
the end of section 5. 

From (19) or from the absolute value signs in (17), (1H) it will he seen that 
all possible situations arise, if r varies merely from zero to one-half. In order 
that this should entail as little variation in 7\ as possible, our conventions should 
be chosen as already stated in (15). Kven under these circumstances, a maxi¬ 
mum variation of 339c in 7s may Ik* required to cover the. range r * 0 to •). 

Kqtiation (21) cannot he used directly without the interpretational convention 
there noted. This leads to difficulties of treatment winch the author wan unable 
to solve. The difficulties may 1 k> avoided by the following device, which ad¬ 
mittedly has lews direct significance than an averaged value for P. 

Wo enquire after the fraction / of the range of c over which P has a value (at 
fixed (l less than some given value Q + Pi,. Wo may then say that, if a large, 
number of trials each of length i is made, then in / of them, (lie probability of 
coincidence, will he less than Q + I\. 

5. Calculation of /. The exceptional behavior of P is that caused by interlock 
possibilities. This corresponds to p t - 0 in (17). Thus the. exceptional values 
of P center about the points c ~ i/k, where i and k arc relatively prime (other¬ 
wise, k would not lie the smallest integer satisfying (17)). Moreover, by a 
standard theorem (l), A* < 1 / u\ Thus the critical points form the Farcy series 
of order 1/to in the range (0, ,)), About each Farcy point, we may suspect that 
there will be an interval over which k is constant, and that (lie entire range may 
thereby Ik; divided up into ranges of constant k. 

In thinking about the use of (17) in a typical calculation, it is convenient to 
eliminate the integer i by representing multiples of c as a series of points pro¬ 
gressing around and around a circle of unit circumference. When e = i/k, the 
fcth multiple will (after i revolutions) coincide with the origin; this and the 
earlier points, it, is easily shown, will be distributed uniformly about the circle 
with a separation 1/k. 

As r moves away from the Farcy point, k will, by definition (17), remain con¬ 
stant until either (a) the point Are moves a distance greater than w from the 
origin or (b) an earlier point moves to a distance less than w from the origin 
(Fig. 0). 

bet (me) be that earlier point nearest (initially l//c from) the origin and moving 
toward it as it varies in a particular direction. Of course, 

(22) m < k. 

For each Farey point, there will be two values of m; one for decreasing c and 
one for increasing e. If we introduce the new variable: h => the absolute value 
of the change in e from the Farey point i/k, then each point, ne, on the reference 
circle will move a distance nh, and (17) gives as the conditions,.for constant k 
(Fig. 7): 

(a) w > kh = pi, 

(23) 

(b) mh < (1/fc) — w. 
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Thus we have divided the range (0, 2 ) into small ranges where k (:and■ «) are 
fixed. The number of small ranges is roughly twice the number 0 y P 

Within each small range pi, K, p 2 still vary with e. The behavior of pi is 



e =0.* 
— e = o. y-otr 
w = 0.02 s 
k 


— e=o.37s~ 
— e=o.3BZ 
w=o. osy* 
K=fi-**5- 


^ Fxa. 6 

already given in (23a); we shall find that we do not need pi. Using (18) and 
Fig. 7, it may easily be shown that: 

(24) K - m + jk + k, 
where 

(25) 3 + a = (1 — mkh — kw)/k\ j = [j], 0 < a < 1. 

From (23a), (24), (25), we obtain: 

(26) {K - k)pi = 1 - feu; — ak 2 h (0 < a < 1). 

Having thus divided the range of e into small regions within each of which the 
number-theoretic requirements (17, 18) take a relatively simple form, wo must 
now turn to the calculation of / = that fraction of the range e = (0, \) over which 
P < F 0 + Q at fixed t. We shall specialize the further analysis to the case 
<2 < i This considerably shortens the discussion and yields essentially all the 
useful results of the more general inquiry. 

We first note from (21) that, since p 2 < pi < w (i.e. because of (4)), we have 
P < Po + Q independently of « if I < Q/w 




mmABJUTT cnr rOINCIDENCK 


27 


(271 / «= 1, for t < Q/ to. 

Similar reaiwnitig shows on the- other hand that, when t > Q/w, those regions 
with k > Q/w do not contribute to /• In the following, we shall there¬ 
fore employ: 

m k < Q/w <t, Q < J. 

Equation (28) implies that we must use either (21b) or (21c); we shall next 
show that we do not need (21c). The value of P whenever (21c) is applicable is 
certainly greater than (P# 4- kw 4- [K — k)p t ). From (20), this value w equal to 
(P* + 1 - ai£h), ^ Now from (28), id < 1/2*, whence by (23a) h < l/2fc® < 1/20* 1 
(ataear* < Thus (P„ 4- 1 ~ ak'h) > P a + \ > P# 4- Q, and consequently 
(21e) tiBVBf applies until P > Pt + Q. (This means merely that the double 
overlap discussed in section 3 cannot occur until at least half the torus is covered.) 
Accordingly, we can confine our attention entirely to (21b) in any further dis¬ 
cussion of /, 



Substituting for p\ from (23) and recalling that (t — k) is positive (by (28)), 
we find from (21b) that the condition P < P B 4* Q becomes: 


(29) 


h < 


Q — ho 
k(l - k) * 


However, h is subject also to the restrictions (23), which insure that wo do not 
stray from the small region where k is constant. We assert that (29) implies 
(23) and may therefore be used as the final expression of the requirement 
P <P* + Q. 

To prove this, note first that (29) and (28) immediately give h < w/k, which 
is (23a). Secondly, (28) implies 1/* > 2u> so that, using (23a) and (22); 
(1/A*) — xu > v> > leh > mh, which is (23b). 

Thus wo arrive at the result that / receives contributions only from those 
elementary regions whore k satisfies (28) and that the contribution of each such 
region is governed by (29). 

Since the variable h was defined as the absolute value of the change of e from 
the Farey point i/k, each Farey point (satisfying (28)) contributes an amount 
equal to twice 3 the right-hand side of (29). Since this amount is independent 


1 This is not true of the Farey points 0 and J, the ends of the range of e, but the terms 
k = 1,2 in (31) correotly account for these contributions since 4>(1) = tj>(2) = 1. 
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of i, we may immediately sum over all Farey points i/k with fixed k, 1 here 
are i<j>{k) such points 3 iu the range (0, i), where Euler’s fimction <j> is defined by: 

(30) 4>(k) = the number of integers < k and relatively prime to k\ 

(Note that is even for k > 3 since if k and i have no common divisor > 1, 
neither do k and k — i.) 

Thus, summing over all these contributions and dividing by the length of the 
total range: 

(31) • for,>c ''"- 

Regarding error in (31) due to the inaccuracy of (21), note that this can enter 
only when we set P = P 0 + <3 in deriving (29). Actually the difference between 
(21b) and the correct value of P will change as e is changed so that there is con- 
siderable possibility that these effects will cancel out in (31). (In fact, a de¬ 
tailed study shows that the error in (21b) assumes opposite signs as r varies in 
opposite directions from any given Farey point,) In any case, because* (31) is 
monotone in Q, the error in (31) can be no greater than that found by substi¬ 
tuting Q ± w'TiTi for Q. Taking account also of the variation of P a with 7‘ 3 , 
the same argument establishes the "Q-dependence" yf (5) given in section 2. 

Finally, we investigate the error due to change in w with 7\. If w is the maxi¬ 
mum value of w, Farey points with k < Q/w are certain to contribute to /, and 
this contribution will be at least as great as (Q - kW)/lc(l - /,•) so that / > /(IS), 
On the other hand, if w is the minimum value of w, Farey points with k > Q/w 
cannot possibly contribute to /, and the remaining points can contribute no 

more than ( Q — kw)/k{t — k) so that / < f(w). Hence we arrive at the final 
statement (6) in section 2 

6, Approximations for/. Computational difficulties in the use of (31) sug¬ 
gested approximating it by a more readily computed expression. By a standard 
theorem [1, p. 206]: 

( 32) <Kk) - M/S 
We may then approximate (31) by: 

/= 1.216 f QUM 9jZ^ dh 

l - k 

= 1.216 Q (\ + l -- ~ Q w iw ~ Q - j/A 
' Q iw — \w )' 

If Q/w is large compared to | (recall t > Q/ w ), this becomes very nearly 

(33) , 8 ( 1 + (*-,)„ („£)) ' fcti>Q/s . 

D«p,te the cavalier deration of (33), its agreement with (31) is remarkably 
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close. Fig. 2 shows a perfectly general comparison of (31) and (33), where the 
agreement will lie seen to be fairly good even for t and Q/w of the order of 4 or 5. 
Note also that (33) nearly always gives a value of f that is too large, 

For completeness, we may repeat (27). 

(34) / = 1 for t < Q/w, 

Note that only the dimensionless quantities Iw, Q enter into (33, 34) which are 
therefore independent of the normalization (15). 
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NONPARAMETRIC ESTIMATION, IN. STATISTICALLY EQUIVALENT 
BLOCKS AND MULTIVARIATE TOLERANCE 
REGIONS—THE DISCONTINUOUS CASE 

By John W. Tukey 
Princeton University 

1. Summary, In Paper II of this series [2, 1947] it was shown that if n 
functions and a sample of n were used to divide the population space into n +- 1 
blocks in a particular way, and if the joint cumulative of the functions mere contin¬ 
uous : then the n + 1 fractions of the population, corresponding to the n 4- 1 
blocks, were distributed symmetrically and simply. 

In Paper I of this series [1,1945] it was shown that the one-dimensional theory 
of tolerance regions could be extended to the discontinuous case, if equalities were 
replaced by inequalities. 

In this paper the results of Paper II will be extended to the discontinuous case 
with the same weakening of the conclusion. The devices involved are more com¬ 
plex, but the nature of the results is the same (See Section 5). 

As a tool, it is shown that any n~variate distribution can be represented in 
terms of an n-variate distribution with a continuous joint cumulative (in fact, 
with uniform univariate marginals), where each variate of the given distribution 
is a different monotone function of the corresponding variate from the continuous 
distribution, 


2. Introduction. The importance of extending Iho simple results of the 
continuous case to the more complex results of the discontinuous case may not 
be clear at first thought. Yet all the data with which the statistician actually 
works comes from discontinuous distributions. Often these distributions are very 
fine-grained—the distributions of the number of eggs laid by codfish and of the 
measured wavelengths of a spectral line (measured in 0.000001 A) do not have 
large concentrated probabilities, but all their probability is concentrated at dis¬ 
crete points. Insofar as the considerations of the'theoretical statistician apply 
to the data as received rather than to the "date" of a more or less imaginary 
model, these considerations apply to data with a discrete distribution. When 
his theories axe erected on a basis of a probability density function, or even a 

ther K ? a d f ef ? te extra P olation from to practice. 
Crete iet and 6 mathematical statistician to study die- 

effects whl rtifh h t ! n f r ° US laVge eSeCtB and P l6aaant *>*11 
sooner or it?face lis M P ^ ^ ^ *** ^ Wd mUSt 

J* ord + f ?° deal ^ discontinuous case, we must face two problems- (we 
assume that the reader is familiar noth Paper II [2]) 

(I) What to do about “ties"? 
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(2) Finite probabilities associated with cuts. 

The first of these is peculiar to the multivariate situation and can be easily ex¬ 
plained by an example. Consider the three points in the plane with coordinates 
fl, 9), (3, 9) and (2, 6). Let the first two functions be y and x, then the pro¬ 
cedure of Section 4 of Paper II [2] is not unique—two possibilities arise: 

Alternative A . (1,9) is selected as having the largest y, and (3, 9) as having the 
largest * among the remaining (two) points, hence Si — \(x,y)\y > 9), S 3 = 

!fit, 2/)|i/ < 9. x > 31, - | fir, y)\y < 9, x < 3). 

Alternative B. (3, 9) is selected as having the largest .r, and (2, 6) as having the 
largest x among the remaining (two) points, hence ,S\ = {(x, y)\y > 9), i, = 
!(/, yVlf < 9, x > 2), ,S’J i4 = J (x, y)\y < 9, x < 2}. 

Notice that »S' 3 ^ S 2 . The procedure is not unique. In the continuous case, 
ties happen with probability zero, hence their consequences could be neglected. 
This is now no longer the case. 

This difficulty is solved by using more functions and the idea of lexicographical 
(like a dictionary!) ordering. In the simplest case, wc add no new functions and 
proceed as follows: If there is a unique i for which is maximal, select it. 

Otherwise look among the v\ for which ipi{v\) is maximal—look at the values of 
VaOcJ. If there is a unique such i for which r,i is maximal, select it. If not, 
go on to v"a(ic,) • • • . This procedure leads to a specific i unless <ph(iu s ) — pAwA 
for h and sonic j A l,\ Hut in this case it does not matter whether j or k is 
selected, the sot of m-tuplcs (vi(i<v>, ys(to,), • • • , y m (u),)) remaining will be the 
same, although (ho indices i will not. Hut (ho indices play no role in the actual 
construction. 

As an example, consider the following 20 four-letter words as a sample and lot 
there be four functions being the negative of the position in the alphabet of 
the i-th letter of the word. (Thus a > h > c > ■ ■ ■ > z.) 

Sample ; meet, west, made, gone, come, hack, said, that, maid, well, with, with, 
just, week, very, near, edge, this, last, have. (The Law of the Three Just Men, 
Edgar Wallace, pp. 159-100). 

Selections : back, made, near, (gone, come, edge, have. The fourth selection to 
be made at random among these four.) The inferences which can bo made about 
the four-letter words in Edgar Wallace's writing vocabulary are left to the reader. 

We have just given one rule for breaking ties, one 4 which chooses Alternative B 
in. our example. But we might prefer a rule which chooses Alternative A. To 
get more generality, wo have only to take M functions, M > m, and let <p„<n , 
[tPpW, ■ * ■ , ipp(m) , (where, we may suppose p(l ) = 1 without loss of generality) 
play the role just taken by <pi, w , * • • , <p m . Thus if the maximum of yi(ir) is not 
unique proceed to <p s (u>), thence to • • •, thence to ^ m (n>). For the second 
block, start with , then <f va)+ i, #>„(» M , ■ • • , <p m . And so on. The choice 

y) ~ y, 

<Pi(x,y) = —x, 
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Vs(x, y ) = xe v , 

<p t (x, y) = x, 

tote, y) = y\ 

with p(l) = 1 and p( 2) = 4, leads to Alternative A above. (Note that vj is a 
d ummy in the sense that it is never used.) The problem of ties, which was a 
problem in uniqueness of construction, is thus dealt with. 

Next we must deal with the cuts. When we made Si , S 2 and in Alternative 

A, we omitted some points, namely 

Ti = {(», y)\y = 9}, and T 2 = {(m, y) \ y < 9, x « 2{. 

In the continuous case this did not matter, since these sets had probability zero 
and could be avoided. Here they cannot, and we shall have to consider a family 
of blocks (in the wide sense) as consisting of the blocks S and the cuts T. The 
solution of the univariate case in Paper I [1] shows us that what we must expect 
is that: 


Pr { coverage S t + T,_i + Ti > i} > Pr { coverage of one 

continuous-case block > 1 } > Pr ( coverage Si > l\. 

That is, if we want a certain set of blocks to cover (together) at least a certain 
amount with a certain probability we must add the adjoining cuts; and if we 
want a certain set of blocks to cover at most a certain amount with a certain prob¬ 
ability we may add only these cuts which do not adjoin blocks not in our set. 
By introducing the cuts explicitly, we solve the second problem, 

In order to reduce the size of the cuts, our detailed definitions will differ in 
detail from those which we have used so far. In the example, where the functions 
leading to Alternative A are used; we place in Si not only the points with y > 9, 
but also those with y = 9 and - a; > -1; we place in St not only the points with 
1/ < 9 and x ;> 3 and the points with y < 9, a; = 3, y > 49, but also those with 
ij - 9 and -x < -1. Proceeding in thisway,we reduce Tito the point a; «= 1 

l = 9 aM T, to the point x = 3 ,y = 9. This reduction can only diminish the 

probability associated with the cuts, but we cannot be sure that it will reduce it 
to zero. 

JSLT* T 141 ™ 1th « probability that all functions 8haU tio 
ogether la Kro .do we return to the simplicity of the continuous case. This case 

“ ’T 7 ”'“7“ n0 ‘ “ tlse ^rste probabilities, and rcalT 

serrations always involve discrete probabilities. 

Th^moo g f 0fTm d s - th tV eSUltS ’7 e ^ n ° W briefly touch 011 methods, 
me prooi ol the mam theorems depends on two facts: 

(1) a representation theorem, (5.3), and 

(2) a lemma, (01) which shows that m functions would be enough if (i) the 
distribution were fixed, and (ii) cases of probability zero were neXted The 
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representation theorem has been outlined in the summary. It is analogous 
to, but a definite extension of the one used in Paper I [ 1 ]. It seems to be new in 
statement, though not in thought—it will surprise few probability theorists. The 
novel element is the monotonicity of the functions, which is utterly essential for 
our purposes. 

The lemma allows us to reduce the general case to the case of no extra func¬ 
tions, where the reduction must be made differently for each underlying distri¬ 
bution. The reduced functions are then represented by the representation 
theorem and the results of Paper II [ 2 ] are taken over. The results are stated 
in a form independent of the underlying distribution and the particular repre¬ 
sentation, hence they apply in general 

The last paragraph stresses the principle common to Paper I [ 1 ] and this paper. 
It is natural to call it the "iceberg principle,” and to sketch it as follows: "We 
have some information about the visible one-ninth of the iceberg, and we want 
to conclude something about this visible part. If we can imagine another eight- 
ninths, consistent with the part we know, and if using that we can prove some¬ 
thing expressed solely in terms of the visible part, then this is the required proof. 
(The only essential is to lie able to match evert/ visible part.)” Both the reduced 
functions (which depend on the underlying distribution) and the uniform vari¬ 
ables used to represent them are part of the invisible eight -ninths which "could 
be there.” 

3. Terminology and Notation. In general we use the (orminology and nota¬ 
tion of Paper II [ 2 ], and we shall continue to assume that all functions concerned 
in the argument are measurable. 

Given two finite sequences of the same length, we write (oi, Os, • • ■ , a m ) > 
(by , l> 2 , • • • , h n ) if any of the following hold: 

0,1 > hi, 

ay — hi , and <h > h 2 , 

ay ~ by, a t = b t , and a 3 > h 3 , 

a, = h, Cor i < m, and a m > i)» . 

This is the lexicographical order referred to above. (Wc interpret («i, a*, • • ■, 
a„) < (by , Ih , ■ ■ ■ , hm) to mean ( 61 , ; • * • , h n ) > (ay, , • • •, o, rt ) and = to 

mean identity.) 

3.1 Definition: Given a sequence of real-valued functions »i, ^ 3 , • ■ • ,vu and a 
sequence of starling indices p(l), p( 2 ), • ■ • , pirn), (which we shall often refer to, 
briefly, as an m-ayetem of functions, <py , & , • ■ • , <pm , without explicitly mention¬ 
ing the starting indices), the functions , $2 ■ , $*> are defined as follows : 

(3.2) $*(w) = {<p P (k}(w), <pp(k)+i{w), 
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the values of being sequences ofM — p(k) + 1 numbers. (In these terms, the rule 
for tie-breaking already explained becomes “select an i for which $*(w,) is max¬ 
imal (in the sense of lexicographical ordering)”.) 

4 . The blocks and cuts determined by n points. 4 . Definition; (riven 
an m -system of functions ipi ,<pi, • ■ ■ , <pn and n points Wi , uh, > w„ , (ftl < n) 

the corresponding blocks and cuts are given by the following procedure, (the $ b are 
defined in 3 . 1 ) First i( 1 ) is selected to maximize $i(w.), when 

Si =. {id | $1(111) > $i(w>cn)!» 

Ti = {w |$i(w) = $i(n),<n)}. 

Next, i( 2 ) is selected ^ i(l) and to maximize among such i, when 
S 2 = {W |$i(iu) < $1 (10,(1)), $i(w) > $2(l0,(j))}, 

2s = jio|$i(io) < $i(w«i>), $5(10) = $ 2 (io <( 2))). 

.(the construction is perfectly analo¬ 
gous to II- 4 . 1 ) 

S m |n+1 = {10 I $1(10) < $j,(io, (<0 ), k = 1, 2, • ■ ,w], 

4 2 Definition: If m = n, then S„| n+ i is also denoted by S n +i. 

If m > n, then only $i, $2, ■ , $„ are used and S n |«+i is also denoted by S n „. 

We denote by X a subset (possibly none, possibly all) of the indices 1 , 2 , • • • , 
m and m\n + 1 or, in case m > n of the indices 1, 2, ■*■,«+ 1. 

4.3 Definition. The block-group B A consists of the. union of all <S\ with i in \ 
and all T, with both 1 and i + 1 m A(m + 1 means m \ n + 1 ), 

The closed block-group B\ consists of the union of all S, with i in X and all T , 
with either i or i + 1 in X 

Given any set we define its coverage as the proportion of the population falling 
into it (here the underlying probability distribution appears for the first time in. 
this section), and we use 

4.4 Definition: The coverage of B x is denoted by C(A) and that of B x by G{\). 

Thus, given a family of functions <p and ft points w, the space of the ui is divided 

into blocks and cuts, these are joined together into block-groups, and these 
block-groups have coverages. Thus, if the family of functions is fixed, the n 
points determine these coverages, and, if the points are chance points, the cover¬ 
ages are chance numbers. 

6. Statement of results. Having discussed the construction, we can now 
state the results. 

( 5 . 1 ) Theorem A mW+1 . Let <pi, , • • • , be any m-sysUm of functions and 

uit W : L , Wi , ■" , W n , -where m < n, be a sample from any distribution, lei the 
blocks, cuts, block-groups and coverages be formed, as described above, using the 




NON-PARAMETKIC ESTIMATION" III 


35 


sa»ii’(unknown) distribution for forming the coverages. Then, if on, a t a p 
are any set of \’s (each X is a set of indices '.), 

Pr {(ai) < ai, C(<xf) < aj, > • • , C{a *) > ah , ■ ■ •, C(ot p ) > a„) 

> Pr [i(ai) < at, Kaf) < a» , - • ■ , t(a k ) > a* , • • ■ , <(«,) > a p ), 

where i(\) = for i in X, i«| n+l = t m n + • * • -f f»+i , and h, h, • • • , <„ +l have 

a uniform distribution on the barycentric simplex. (Compare Theorem A m | B n of 
Paper II [2].) 

In particular, 

Pr {<7(0 < aj > 7,(1, n) > Pr {<7(0 < a}, i = 1, 2, • •. , w, 

where I„(l, n) is the incomplete Beta-function. 

(5.2) Theorem B„ + i . Let (pi, &, • • • , <p u be any n-system of functions and 
let TPi, Wi , • • ■ , W n be a sample from any distribution. Then 

Pr {CW < ai, C(af) < at, ■ • •, C(a k ) > a*, • • ■ , C(a p ) > a p ) 

> Pr (i(ai) < a x , l(ctt) < an, ■ • • , i(a k ) > a*, ■ • * , i(a p ) > a P |, 

where t{\) = Si,- /or i m X and h, t t , • • • , i rt+ i have a uniform distribution on the 
harycentric simplex. In particular, 

Pr {C(i) < a] > 7 a (l, n) > Pr (<?(i) < a\, i - 1, 2, * • ■ , n + 1. 

For convenience of reference, we also state the representation theorem as: 

(5.3) Theorem C. Let X x , X it • • ■ ,X n have any joint n -variate distribution. 
Then there exist (real) functions Pi, Qi , ■ • * , (7.. and a joint distribution for 
Ui ,lh , • • • , U n such that, 

(i) the marginal distribution of each U , is uniform on [0, 1], 

(ii) each function g is non-decreasing, 

(iii) the distribution of (h(Ui), g*(Uj), , g n (U„) is identical with that of 

Al , As, ■' ■ , A n . 

6 . The functions /. The aim of this section is to prove 
(G.l) Lemma. Given any m -system of functions , <pi , • • • , vu , there exist 
real functions , • • • , such that, if W k , TFa, • < • , !!'„ arc a sample from 
the distribution concerned: 

(6.2) Pr {^((iPy) ~ hO^k), but ^,uOVj) & i'wfWf) for some h > 0) =0. 

(0.3) Pr |'I>i(l'Fy) has a different relation to 'T'OTO than that of ffW,) to } ~ 0, 

where by relation is meant >, =>, or <, 

The ipi will depend on the underlying probability distribution, Thus they are 
useful in the proof, but could not roplace the $,■ in the statement of the theorems. 

(6.4) Lemma. Let 4>(w) have its values in a totally ordered set, (i.e. always either 

$i < = <h or > 4>a) and let TV have a distribution . Consider the function 

*> 


\p(w) = Pr {<$(TF) < $(w)), 
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Let Wi, Wi, ■ ■ • ,W n be a sample from the same distribution, then, with probability 
one, the relation (<, =, or >) between 4 > (Tf / ',) and is the same as that be¬ 

tween \p(W,) and $(Wk)- 

If $(wj) < $(w k ), then P(w,) < i(wk), if iK«b) < 'PM, then <fc(u>/) < $(«>*). 
These follow directly from the definition. To prove the lemma, then, wo must 
show that 

(i) fiw,) = p(-w k ) but <b(w,) < 't(wi) occurs with probability zero. 

We may clearly assume that the totally ordered set is complete, and that, in 
particular, it contains the symbols - w and + *>. Consider the real function of 
an abstract variable, 

F(s) = Pr |*(W0 < a}. 

It is a monotone function, with F(— «) = 0 and F( 4-«) = 1. We can there¬ 
fore, given t > 0, select elements — °° = So < Si < < ■ * • <s k ~ + 00 such 

that 

0 < F(s, +1 ) - F(s< + 0) < e. 

If (i) occurs, then $(w 3 ) and #(«*) belong either to the same open interval 
(si, s, + i) or one belongs to an open interval and the other is its upper endpoint. 
The probability of either of these happening is at most 

[F(s, +1 ) -F(s< + 0)} J + n{F(a, +] ) - F(s { + Q)}{F(s in + 0) - Ffa+i)). 

Summing this over all intervals yields an estimate of 

Max [F(s h i) - F( s , + 0)} = n(n ~ 1} f . 

* * 2 

Since this goes to zero, the lemma is established. 

We turn now to the proof of (6.1). The system of functions tpi, <pi , • * • , 
define the §i, , • • • , according to Section 3. These define 4>i, h, • , 

Prn according to lemma (6.4) just proved. Applying this m times proves (6.3). 
Recalling that $,(«),) = $,(w») implies $,+ k {wj) = $ m (w h ), we see that (6.3) 
implies (6.2). 

7. The notation F{x + X-0). All practitioners of analysis are familiaT with 
F(x + 0) and F(x — 0), defined by 

F(x ± 0) = lim F{x ± h). 

a to 

We now generalize this formal notation to 

(7.1) F{x + X-0) = Fix + 0) + LZ2 F(x - 0), 

where we will, in our immediate applications, need only X’s between -1 and +1 
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(although the definition applies in general). Notice, for example, that 

F{x - 0) < Fix + X-0) < Fix + 0), for -1 < X < 1, 
that if F is continuous at x, 

F(x + X-0) = Fix ± 0) = Fix), 
that the condition for F to be normalized is 

Fix + 0-0) = Fix). 

A similar definition is made for functions of two variables, namely 
Fix + \-0,y + p-0) = ] -t-?F(x + \-0,y + 0) + Fix + X 0, y - 0) 

= Fix + 0 l2 / + M -0) + Fix - 0, y + g-0), 

where the two right-hand sides are equal if, as is the case for cumulatives, all 
doubly one-sided limits exist. 

If F(.t'i, -t 2 ) is the joint cumulative of two variates, then, when all ordinates 
and abscissas involved are ordinates and abscissas of continuity, 

Pr (a < x < b, c < y < d\ = Fib, d) - F(b, c) - F{a, d) + F(a, c) > 0. 

Passing to the limit m assorted ways, and taking linear combinations gives 

Fib + jli-0, d + p-0) — Fib + y0, c + i"0) 

(7.2) 

- Fia + X-0, d + p-0) + Fia + X-0, b + v-0) > 0, 

for — oo < a, b, c, d < + °o and — i < X, g, v, p <1. This will be of use 

shortly. 

8. The representation theorem. It was shown in Paper I [1] of this series, 
that the uniform distribution on [0,1] could serve as the prototype of any variate 
—that is, that given a distribution, there is a monotone function g, so that giU) 
has the given distribution, whore U has the uniform distribution on [0, 1]. 
(In Paper I, U was denoted by X*). 

In the notation of the last section, there is a function X (u), with | X (u) | < 1, 
so that 

(8.1) Figiu) + X(u)-0) = u, 

for all it. (We may, and shall, require that g{u) = — oo, for u < 0, and g(u) 

= +oo for it > 1). It is easy to see that gi it) is unique except on a set of 

probability zero and that X(it) is unique (and in fact linear) on each open interval 
which contains no value of Fix). 

Each cumulative Fix), then serves to define giu) and X(it) by the equation 
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(8.1). Two or more independent variates can be thrown back on a set of indt,- 
pendent uniform variates by applying this process to their cumulatives separately. 

Our present problem is to prove Theorem C (5.3), which applies to variates 
Xi, X t , • • , X„ which need not be independent. Let F{(x,) be the (marginal 
cumulative of X,, and use (8 1) to define and X<(u,). Then define the 
joint distribution of Ui, Ui , • • • , Tin by 

0(u i, , • • * , Mb) = F(dl( u l) 4“ Xi(«0-0,-, f7n(tin) 4“ X|,(tt n ) ' 0), 

where F(x , , , • • • , x n ) is the joint cumulative of the Xi , X* , • ■ • , X n - 

We shall verify that this is the desired distribution in the case n = 2, leaving 
the general case to the reader. Consider G(ui , + «>) = G(ui , 1) = F(ff i(tq) 
+ X, (iii)-0, + “)• This is a cumulative, and so is <?(+ 00 , ui)- In fact, 
using (8.1) they are each the uniform cumulative 

0 , u < 0, 

G(u) = • m, 0 < u < 1, 

1 , 1 < u. 

By (7.2) all second differences are positive, and hence (?(% , ui) is a joint cumu¬ 
lative Since its marginals are uniform, it is continuous. 

Finally, 

Pr{gi(Ui) < Si> g t (Ui) < s 2 t = G(F(si - 0, + “), F(+ s% — ())) 

= F(si - 0, s 2 - 0), 

since gM < si is equivalent to «i < F(s 1 - 0, + w) and g 2 ( Ut) < s 2 is equiva¬ 
lent to U 2 < F(+ co, 5 , — o). Thus gi{Ui) and g t (Ui), have the given bivariate 
distribution. 

9. Proof of main theorems. We come now to the proof of Theorems Am|n+i 
and B n+1 , and we begin with A m |„ 4 i. According to Lemma (6.1), the various 
indices, t(l), i(2), ..., i(m) selected to determine the blocks will be the same, 
excluding cases of probability zero, whether the or the to are used. Consider 
the first block, which takes the forms: 

s[ = {W\^(W) >«!(»,„)}. 
si = {W I to(F) > ti(wia ))). 

Another application of Lemma (6.1) shows that these sets differ by a set of 
probability zero, and hence their coverages are identical. It will thus suffice to 
prove theorem A ra ' n+1 for a fixed underlying distribution and the corresponding 
to»to. • ■, to* ■ 

According to Theorem C (5.3), the m-variate distribution of the toW can be 
represented in terms of uniformly distributed variates Th , • ■., U m and monotone 
unc ions c/i( i), , g m {U m ). Now Ui, Ui, , U m have a continuous joint 
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cumulative, so that theorem A OT | n+1 applies to a sample of n drawn from this in¬ 
mate population, with the coordinates themselves as the m functions, We shall 
denote the coordinates of the f-th element of this sample by • • •, u m (t). 
Consider the first block, 


Its image, = mwi, } , 

contains 


Si»l(l7i,- > W|Ui>ui(t(l))|. 
M\Ui>um)\ 


** * jk ^ 

and is contained in the union of Si and Ti , where 


T? = lift' 


v, 


,M\§m=(!m)))i 


Thus the conclusions of Theorem hold for S*, Z?, • • *, Si , Tl , 6^|n+i. 

Now while Theorem Aj|, +l mentions the underlying IPs implicitly, careful 
study shows that they are not really involved; only the joint distribution of the 
v>i> which in our present case are the , matters. Since this is the same for the 
f ,(in and the gM), Theorem A*| nf i must hold for the and the theorem 
is proved, 

Theorem B„ n is again a special case of Theorem A* irifl . 


cA m* 
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ASYMPTOTIC PROPERTIES OF THE MAXIMUM LIKELIHOOD 
ESTIMATE OF AN UNKNOWN PARAMETER OF A DISCRETE 
STOCHASTIC PROCESS 

By Abraham Wald 

Columbia University 

Summary. Asymptotic properties of maximum likelihood estimates have 
been studied so far mainly in the case of independent observations, In this 
paper the case of stochastically dependent observations is considered. It is 
shown that under certain restrictions on the joint probability distribution of the 
observations the maximum likelihood equation has at least one root which is a 
consistent estimate of the parameter 0 to be estimated. Furthermore, any root 
of the maximum likelihood equation which is a consistent estimate of 0 is shown 
to be asymptotically efficient. Since the maximum likelihood estimate in always 
a root of the maximum likelihood equation, consistency of the maximum likeli¬ 
hood estimate implies its asymptotic efficiency. 

1. Introduction. Let (X,), (i = 1, 2, , ad. inf.), be a sequence of chance 

variables. It is assumed that for any positive integral value n the first n chance 
variables Xi , • • , X n admit a joint probability density function p n (ji, ■ • • , 
x n , 8 ) involving an unknown parameter 0. The consistency relations 

(1*1) / Pn+iOri) * * ■ , a.ii4i } 0) dain+i = Pnfal , * * * , 3-n , 0) 

J— cc 

are assumed to hold 

In what follows, for any chance variable u the symbol E{u\0) will denote the 
expected value of u when 0 is the true parameter value. 

Let tnixi, • • • , x*) be an unbiassed estimate of 0. Cramdr [1] and Rao [2] 
have shown that under some weak regularity conditions on the distribution 
function p n {x \, • ■ ■ , x „, 0), the variance of t„ cannot fall short of the value 

_1_ 1 

(i 2 ) c "(0) E ^ 0 log p„ y | ■ 

Thus, for any unbiassed estimate t n the variate \/cJfi){l n - 0) has mean value 
zero and variance 11. An estimate 4 is called efficient if s/cJj)iL - 0) has 
mean value zero and variance 1. W 

• n sec £ ence ( n = 1> 2 > •• » a d* inf*), of estimates is said to be asymptot- 
ically efficient if the mean of Vc„(0) ( t n - 0) is zero and the variance of vcv(0) 

. ■ ’ s 1 m t,!le limit as n co. In the literature usually the additional re¬ 

quirement is made that the limiting distribution of Vc n (0) (t n - 0) be normal. 

40 
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To make a distinction between the two cases when the condition concerning the 
limiting distribution of Vc„(0) (t„ — 0) is fulfilled or not, we shall say that \L] 
is asymptotically efficient in the wide sense if it satisfies the conditions concern¬ 
ing the mean and the v ariance of \/c n (0) (<* — 8 ) If, in addition, the limiting 
distribution of Vc„(0) (i» — 6 ) is normal, we shall say that {£„} is asymptot¬ 
ically efficient in the strict sense. Clearly, if {/„} is asymptotically efficient in 
the strict sense, it is also asymptotically efficient in the wide sense. 

A word of clarification is needed as to the meaning of the conditions concern¬ 
ing the mean and variance of \/c n ( 9 ) — 8 ) One interpretation would be 

that the requirement is that 


(1.3) 

lim A[Vc n (0) (tn - 8) | 0] = 0 

n—w 

and 


(1.4) 

lim E[cn(8) (t n - 0) 2 1 0] = 1. 




Another int erpre tation would be that the requirement is that the limiting dis¬ 
tribution of Vc„(0) [L — 8 ), provided that the limit distiibution exists as n —> co, 
should have zero mean and unit variance. These two interpretations are cer¬ 
tainly not equivalent. It seems to the author that the mean and variance of 
the limiting distribution is more relevant than the limits of the mean and the 
variance. We shall, therefore, adopt the following definition of asymptotic 
efficiency: 

Definition: A sequence {L} of estimates is said to be asymptotically efficient 
in the wide sense if a sequence , (n = 1,2, • - ■ , ad. inf.), of chance variables 
exists such that 

(1.5) lim E(u n 1 0) =0, lim E(u n | 6 ) = 1 

flnOO 71 b*°0 

and 

(1.6) Vc n (0)(tn - 8) - u„ 

converges stochastically to zero as n —> co. If, in addition, the limiting dis¬ 
tribution of Vc7(0) (In - 8 ) exists and is normal, {(„) is said to be asymptotically 
efficient in the strict sense. 

The reason that a sequence {m«| of chance variables is considered in the above 
definition, instead of the limiting distribution of v'cjJ) (in — 6 ), is that the exist¬ 
ence of a limiting distr ibution of Vc„(0) (/„ — 0) is not postulated. If a limiting 
distribution of VcX#) (U — 9) exists and if this limiting distribution has zero 
mean and unit variance, a sequence |w„) of chance variables satisfying the con¬ 
ditions ( 1 . 5 ) and (1.6) alwa ys exists. This can be seen as follows: Let T n denote 
the chance variable Vc7(0) (U — 0) and let F„(t) = prob. [T n < t \. If a limit- 
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ing distribution of T n exists and if this limiting distribution has zero mean and 
unit variance, then 

(1.7) lim lim f t clF v (l) = 0 and lim | lim f l 2 dF n (t ■) = 1. 

a=oo _n=« J—a _ |_n=oo *>~a J 

From (1.7) it follows that theie exists a sequence {a„), (n = 1, 2, • • ■ , ad. inf.), 
of positive values such that the following conditions are fulfilled: 

r An r>dn 

(1.8) lim / tdFr,(t) = 0; lim f 5 dF n (l) = 1; limProb (] 2\| > a«) » 0. 

71=00 J—a n 71 = CO j— a n TlmOO 

Let u n be a chance variable which is equal to T„ whenever | T n | g a„, and equal 
to zero otherwise. Clearly, the sequence {«„} will satisfv conditions (1.5) and 
(16). 

In the following section we shall formulate some assumptions concerning the 
probability density function p n (x i, ■ • , x n , 6). It will then be shown in sec¬ 
tion 3 that there exists a root of the maximum likelihood equation 

(1 9) alQ g?" _ 0 

36 

which is asymptotically efficient at least in the wide sense. 


2. Assumptions concerning the probability density p n (x!, • ■ • , x„ , 0). Wc 

shall assume that there exists a finite non-degenerate interval A on the 0-axis 
such that the following conditions hold: 

Condition 1. The derivatives , « = 1,2,3), exist for all 6 in A and for all 

samples (xi, ■ ■ ■ , x n ) except perhaps for a set of measure zero. We have fur¬ 
thermore, 



Condition 2 . For any $ in A we have lim c n ( 6 ) = «. 

ft—00 

Condition 3. For any 9 in A the standard deviation of divided by the 

expected value of i-fe-" (both computed under the assumption that 6 is true) 
converges to zero as n —> ». 


Condition 4 There exists a positive a such that for any 9 in A the expression 

( 2 . 2 ) -44lu.b g ll°g O il .1 

Cn(9) L »' W* 7 ® 


is a bounded function of 
In what follows in this 


n where 9' is restricted to the interval | o' _ g \ ^ s 
section, as well as in section 3, the domain of 9 will be 
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restricted to interior points of the interval A unless a statement to the con¬ 
trary is explicitly made. 

Clearly 


(2.3) 


E 


f 0 log p n 
v de 



dPn 

08 


(lx i ■ •' (lx n . 


It follows from Condition 1 that 


(2.4) 



Hence, 

(2 5) 

We have 


p ( 0 log Pn 
\ 06 



= o. 


(2.G) 

Hence 

(2.7) 
But 

( 2 . 8 ) 



because of Condition 1. From (2.7) and (2.8) we obtain 


(2.9) 


„ / d 2 log p n 
\ 06* 



-c M 


Conditions 3 and 4 will generally be fulfilled when the stochastic dependence 
of x, on x, decreases sufficiently fast with increasing value of j i — j |. For, in 
such cases, the following order relations will generally hold: The standard devia- 

tion of - ■ ■■ " will, in general, be of the order V n, the expected value of 

PIT 


l.u.b, 

|S<_S|g;8 


t>* p n 
06 '* ~ 


will usually be of the order n, and 

n 

bound and a finite upper bound. 


will generally have a positive lower 
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3. Proof that the maximum likelihood equation has a root which is an asymp¬ 
totically efficient estimate of 6 (at least in the wide sense). Let 6a denote the true 
parameter value and let 8 be any other value. We put 

(3-D iiga-fc. ^ 

Expanding f n (a:i, ■ • • , x„, 8) in a Taylor expansion around 0 = 0 o we obtain 

$«(%!,x n , 8) = $„(zi, ■ • • ,», , fio) + (6 - OoWnixi, • < * , x„, Oo) 

(3-2) + HO - 0o) 2 $"(xi , ••• , x n , o' n ) 

where 8 n is some value between 6 0 and 0. Dividing both sides of (3.2) by c n (0 0 ) 
we obtain 

4*n()ri, • * * , x n j 6) d’nl.ri, * * • , x n , $q) 

c n (8 0 ) 


(3.3) 


Cn(8o) 

+ (0 — On) 


$n(ai, • ■ • , K n I tfo) J_ X /a n \2 ^”(*1 , ‘ ' * , X n , 0*) 


C n (0 o) 


+ HO - 00 ) 2 


C n (0 0 ) 


From Condition 3 and equation (2.9) it follows that 
(3.4) 


plimtllfa.*o) , _! 

n-ao C n (0a) 


where the operator plim stands for convergence in probability (stochastic con¬ 
vergence) . 

According to equation (2.5) the expected value of 'E.fo 0 O ) is zero 

Since the variance of *,(*» , ■ ■ ■ , x„ , 0 9 ) is equal to c n (0 a ), and since 
lim c n {8) = oo, we have 

(3 5) plim , • ■ •, x n , Q 0 ) = 

n»oo Cn(^o) 

It follows from Condition 4 that for any 0 with | 0 - O 0 1 g 5 we have 

(3-6) i Jg*([«(»i,-,^,»:|)-oa). 

According to Markoff’s inequality the probability that a positive random 
variable will exceed A-times its expected value is not greater than i. Hence 
it follow, from (3.6) that for »n y « > 0 we caI1 Snd , ^ ^ ^ 

(3.7) to Prob 1 «<*,•••,«., .1) | S b.} a , 
itive n 

4> b (*i , ,x n ,0) =0 


Let p be any given positive number The nmhnhiiu^ . .v 
likelihood equation probability that the maximum 

(3.8) 
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will have a root in the interval (So — p, 9o 4- p) converges to one as n —* «. 
This follows easily from (3.3), (3.4), (3.5) and (3.7). Thus, we have shown that 
the maximum likelihood equation has a root 8 n which is a consistent estimate, 
i e. it satisfies the relation 


(3.9) plim ( 6 n — @o) = 0. • 

We shall now show that if 6 n is a root of the maximum likelihood equation 
(3.8) and if is a consistent estimate, then 8 n is also asymptotically efficient, 
at least in the wide sense. For this purpose we substitute 0„ for 8 in (3.3) and 
multiply both sides of the equation by Vc„(0o)- We then obtain 


(3.10) 

where 

(3.11) 
Let 


gypsi, • • • , . z„, ffp) /rn^ ts a \ $»(xi, , % n , 8 a ) 

0 - vm + VcM {s • ~ — m — 


+ \/c*(0q) (On “ 0o) 2 V n 


1V _ 1 (»^1 7 " * * j j ^ti) 

n ~ 2 • 


(3.12) y„ = and z„ = Vcn{6o) % ~ 0o). 


V c n (6 0 ) 


Then (3.10) given 


(3.13) 


_ = *»(«! , • ” , X n , 9 0 ) 


~Vn — Zn — - ~7K\ 

C«(0 c) 

It follows from (3.7) and (3,9) that 
(3.14) plim (0 n — 0 O ) v n = 0. 


+ Zn(0 n — 0o) v„. 


From (3.4), (3.13) and (3.14) we obtain 
(3.15) - y n = z„(- 1 -J- f„) 

where 


(3.10) plim £„ = 0. 

nww 

Since. Ey n — 0 and liy\ = 1, it follows from (3.15) and (3.16) that 
(3-17) plim (z„ — i /„) - 0. 

The asymptotic efficiency (in the wide sense) of 0 n is an immediate conse¬ 
quence of (3.17). Our main result may be summarized in the following theorem: 
Theorem. If the true value of the parameter Q is an interior point of an inter- 
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val A satisfying the conditions 1—4 , the n the maximum likelihood equation (1.0) 
has a root 1 which is a consistent estimate of 6 Furthermore, any root of (1.0) 
which is a consistent estimate of 8 is also asymptotically efficient at least in the wide 
sense. 

Since the maximum likelihood estimate is a root of (1.9), it follows from the 
above theorem that whenever the maximum likelihood estimate is consist™!, 
it is also asymptotically efficient at least in the wide sense. 
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1 The probability that (1.9) has at least one root converges to unity as n -*• ®. 



DISTRIBUTION OF A ROOT OF A DETERMINANTAL EQUATION 

Br D, N. Nanda 

Institute of Statistics, University of North Carolina 

Summary. S. N. Roy [2] obtained in 1943 the distribution of the maxi¬ 
mum, minimum and any intermediate one of the orots of certain doterminantal 
equations based on covariance matrices of two samples on the null hypothesis 
of equal covariance matrices in the two populations. The present paper gives 
a different method of working out the distribution of any of these roots under 
the same hypothesis The distribution of the largest, smallest and any inter¬ 
mediate root when the roots are specified by their position in a monotonic ar¬ 
rangement has been derived for p = 2, 3, 4, and 5 by the new method. The 
method is applicable for obtaining the distribution of the roots of an equation of 
any order, when the distributions of the roots of lower order equations have been 
worked out 


2. Introduction. If .r = || x„ || and x* = || x* || are two p-variate sample 
matrices with m and ?i-< degrees of freedom respectively, and S — :cx'/ni and 
.S'* = z*z* , /n 2 are the covariance matrices which under the null hypothesis are 
independent estimates of the same population covariance matrix, then the joint 
distribution of the roots of the determinantal equation |A — 0(A+ B)\ = 0 
where A = niS and B = n»S* has been obtained by Hsu [1] in 1939, The dis¬ 
tribution densty is 


^•nr(i±ii±4±i^\ 

R(i, ix, v) ~ -r- T ~ 2 l . \v ~ 

nr(^) r (^lzi)r, 


r(t/2) 


<«=il (ml 


(0 g 0 ( ^ Si -1 g • • ■ 01 g 1), 


where l = min. (p, nf), n = \p — n 1 1 + I, and v = rtj — p + 1. 

This formula also gives the joint distribution of the squares of canonical cor¬ 
relations on the null hypothesis, that the two sots of variates are independent 

HI. H 
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are the observations on the two sets of canonical variates and the x’s ait* nor¬ 
mally distributed, independently of the w’s, then the equation for the canonical 
roots is | V xw VZlV wx — 9Vxx \ - 0, wheie 0, = u and V xu> = A"If' etc. 

It is observed that Vx^VkIuV^ is like A with ri\ = q and V xx — V xa VZ»V„, is 
like B with n 2 = N — g - 1 and the above equation is reduced to the form 
| A — 0(j4 + B) | = 0. It is under this condition that It{l, m, v) gives the joint 
distribution density of r\, r\, ■ ■ ■ , r), where l = min. ( 79 , q), a «= | p — q ] + 1 , 
and v = N — p — q. 

3. Notation and preliminaries. 

(a). Let 

11 ( 0 . - Bj) = { 1 , 2 , 3 , ••• ,1}. 

»<7 

It is known that the value of the Vandermonde determinant 
1 1 1 1 

01 02 63 0| 

el 0 ? 08 ••• e] 

ef 1 eh'et 1 ' e\~ l 
is equal to n (0. - 0 } ) = (—1)*{1,2, 3, • • • ,1}, 

Then 


1 1 1 
01 02 03 

0 ? el el 


= (02 ~ 0l)(0 3 - 0 2 )(0 3 - 0l) = - {1, 2, 3}, 


hut the determinant can also, by expansion in minors of the first row be 
pressed as ’ 


ex~ 


where 


-[010 2 {1, 2} + 0 2 0 8 {2, 3} + 080J {3, I)] 


0i-0 S = {1,2). 

Hence 

® 2 ’ 3 I = 2 ! + 0a0i{3, 1) + 0 2 0 s {2, 3). 

Similarly 

{1, 2, 3, 4) = 0i0 2 0,{l, 2, 3} - 040 1 0,{4, 1, 2} 

+ 0s0i0i{3, 4, 1} - 020304{2, 3, 4}, 


(3) 
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and 

11,2,3,4,5} = M20 3 0*{1,2,3,4} -MsMatfafS, 1,2, 3} + 5, 1, 2} 

(4) 

-}- {3, 4, 5, 1} 4" 0203040s}2, 3, 4, 5}. 

It is seen that in the successive terms the 0’s are present in a decreasing order, 
(b). Let 

(a, 6 ; m, n) = y m (l - i i) n £ = b M (l - b) n - a m (l - a) n , 

and 

(a, 1 , 6 ; m, n) = f y m (l - y) n dy\ 

"a 

then 


(4, a) 


(a, 1 , b ; in + 1 , n) 


(a, b ;m + 1 , n + 1 ) , m + l 
m i- n + 2 m + w + 2 


(o, 1 , b; w, w),. 


by a combination of the transformations obtained by partial integration and by 
breaking up (1 - y) n+i into (1 - y) n -y( 1 — y) n . 

(c) Let 


(a, 2 , 1 , 6 ; m, n) = f 
( a ) 2 , b, 1 , c) m, n) = J 


<b 


a<h<*<6i<c 


mra - ot - *r{i,2j mhu, 

(did,) m ( 1 - »i)"(l - «0"U,2) dftt , 


and 


(a, 3, 6 , 2,c, l,d;m + 1 , n) 

= f (010,0,) m+I (l ~ 0i)”(l - 0,)"(1 - 0 3 ) n {l, 2,3} d0i d9x de s . 

*'a<®»<b<*j<c<9| <d 

(d) Let 

T^ n g(y) ~ f y m (l ~ y) n g(y) dy, 

then 

2 /; A, 0 - (a, 1, 6; m + A; n + 0, (A > 0) 

and 

rS ;M - n (b, 1, c; fc, 0 = (a, 1, b; m, n)(b , 1, c; A, Z). 

With these preliminaries we proceed to derive the distribution of the roots. 
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4. Distribution of the largest root. Let us suppose tluil the* roots are arranged 
in decreasing order such, that for I roots we have 

0 < 6i < 8i -1 < 0i-2, • ■ , < 02 < 0i < I ■ 

If the distribution density R(l, g, v) given by (1) be expressed ns 

i t 

R(l, m, n) = C(l, m, n) II 0? IT (l - 0.)" II 0. ~ 0,), 

1=1 1=1 <</ 

then the distribution of the largest root in the general cane would be given by 
Pr(9i ^ x) = 0(1, to, n)( 0, 1,1 — 1, • • ■ , 2, 1, a:; m, n). 


Now we shall derive the distribution of the largest root for l = 2, 3, 4, and 5. 
(a) l = 2. 

Pr(6i g x) = C( 2; m, n)(0, 2, 1, x; m, n). 

(0,2,1 ,x;m,n) = f (0x0 2 ) m (l - 0i)"(l - 0 a )"(l, 2) <W,dth 

J o<e,<$ l <z 

= [ 0?(l-02) n 0r(i-0O n {l,2j d 8 lt / 9 t 

= f 02 (1 - 0 5 )"0r +, (l - 0 l ) rt dOx <10 , 

- [ 0"(1 - 0 2 ) n 0r +1 (l - 0i)" d0 x dO-i . 


The Limits' in the successive integrals are to be so adjusted as to keep the inte¬ 
grand same. Then using the notation given in section 3(d) and equation (4, a). 

(6) (0, 2, 1 , X) m, n) = T^iy, 1 , x; m + 1 , ») - T^' n ( 0 , 1 , y; m + 1 , n) 

or 


(0, 2,1, xym, n) = TT* m ,+ X > n ±+ _iii± 1 r y 1 x . m n ) 

L TO + n + 2 TO + n -f 2 ; 

, (0,y;TO + l,n + i) _ (m + 1) , 1 

• m + n + 2 m + J ’ 

Now by a change in the order of integration, 

r «' M ’ n [(0, 1, V, m, n) - (y, 1, x; m, n)] = 0. 


Therefore 


(m + « + 2)(0, 2, 1, x; m, n) = 2f’"*"[2(0, y- m + 1, » + I) 

- (0, y, to + 1, n + 1)] 

= 2(0,1, X) 2ro + 1, 2n + 1) 

— (0, x; to + 1 , n+ 1)(0, 1 , x; m, n). 
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Hence 
Pr(9 1 g x) 


C( 2, m, r) 
m + n + 2 


[2(0, l,s;2m + 1, 2n + 1) - 


(0, + 1, ft + 1)(0,1 ,x;?n, ft)]' 


= (7(2, m, n) 


m + ft + 2 J 0 


!} 


,2m+l 


(1 - f/) 2n+1 dy 


x w+1 (l - a:)' ,+1 
wi + n + 2 


/V( 1 - 2 /)" 



(b) l = 3. For this case we need certain results for Z = 2 which can be easily 
obtained and are given below: 


(6) (o, 2, 1, b; m, n) = —- —— (a, 1, b; 2m + 1, 2n + 1) 

m+n+2 ' 

~ m n ^ 2 + 1) + (0, h; m +1, n +1)] X (a, 1, Zi; m, n) 

and 


(a, 2, b, 1, c; m, n) — — ^ _j_ 2 a i w + 1, n + 1)(6, 1, c; m, n) 

+ (°» b '> m + 1 , n + l)(a, 1, c; m, a) - (0, c; m + 1, n + l)(a, 1, 6; m, n)]. 

Now 

(0, 3, 2,1, x\ m, ?i) 

= L... - « 0"(1 - «*)"(! - 0 8 )"{ 1 , 2 , 3 ) d6, de , de* 

= / (fliff. - fli)”(l - 5 2 ) n (l - fc)"[«A{l, 2) 

+ 03 1 } -f- 0 2 0„{2, 3}] d 6 1 ddi d&» 

(using equation (2)) 

- [ , , , ^(1 - «.) n (<ZiO a )"(l - 0|)"(1 - 0a) n {l, 2) dh dOt 

<fla <®j <ae iii 

+ / + / 

or 

(0, 3, 2, 1, X] m, ft) = T 0 ‘ ’ (y, 2,1, xj m + 1, n) 

+ 0,l,y,2, + l,n) 

i^g.VCPJLL.g; w + 1,»), 

v. » a > ^ n.k 1 fe, 
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but the 6’s are to be always arranged in the same order, hence 
(0, 3, 2,1, x; m, n) = T$ m ' n (y, 2, 1, x; m + 1, n) 

- Tf m '\ 0, 2, y, 1, x; ni 4 1, n> 

+ To !m ' n (0, 2, 1, y; m + 1, «). 

Using equations (6) and (7), we have 
(0, 3, 2, 1, k; w, n) 

= m + V+ 3 f 2 ^’ 1 > a: i 2m + 3 > 2?l + 1 ) ~ (^. 1, a:; w 4 1, n) 

X [(0, y, m, 4 2, n 4 1) 4 (0, x, m 4 2, n 4 1)] 

- (0, l,®;m +1,»)(0, y\m 42, n 4 1) 4 (0, l.y; m 41, «)(0, x\m 4* 2, n -f 1) 
+ 2(0,1, y,2m 4 3, 2n 4 1) - (0,1, y-m 4 1, u)(0, y ; m 4 2. n + 1)| 

rjix,m,n 

= »r+ n + 3 { 2 ^i 1 »*i 2m + 3 »27i + 1) + (0, 1, 2 /; 2/n + 3, 2n 4 1) 

- (0, y, m + 2, » + 1)[(0,1, *; m 4 1, n) 4 (0, 1, v: m + l, n ) 

+ (y, 1, x; m 4 1, n)] - (0, a; m 4 2, « 4- 1) 

[(»> !»*i m + 1. «) “ (0,1, y; m 4 l,n)]} 

rpx,m,n 

~ m n 3 ( 2 ^i 2m + 3, 2n 4 1 ) 

~ 2(0, y; m 4 2, n 4 1)(0, 1, a;; w 4 1, n) 

- (0, x; m 4 2, n 4 l)[(y, 1, x; m 4 1, ») - (0,1, j/; m 4 1, ft)]) . 

Using equation (5), we have 


(0,3, 2,1, *; n, n) - {2(0,1,2» + 3, a» + 1)(0 , 

- 2 ( 0 , 1, *; 2m + 2 , a, + i)( 0 , 1 , m + 

jjence - (0, «i» + 2, , + 1)(0, 2, 1, „)|, 


Pt(9i £ X) m - C ( 3 » m > n ) J o/n 1o , „ 

(m 4 ft 4 3) * 2 ^ 0, lj 3:5 2m + 3 » 2n + 1)(0,1, *; m, n) 

(8) ~ 2(0) l > X ’ 2m + 2, 2ft 4 1)(0, 1, *; m 4 1( n) 

~ (0 ’ x ‘’ m + 2 > n + m2,l,x;m,n)}. 

4. In order to determine ( 0 , 4 , 3, 2 , 1 ,wi, *) we need the valuea q[ 
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(a, 3, 2, 1, b, to, n ), (a, 3, 5, 2, 1, c; to, n) and (a, 3, 2, I>, 1, c; m, n), which are 
obtained according to the procedure given above. 

Now 


(0, 4, 3, 2, 1, a:; wt, n) = f 
Jo 


fil'd - ^)"(Oifi 2 0 3 )" 


(1 ~ ft)*(l - fi 2 )”(l - fi a ) n {l, 2, 3, 4) dfiidfi a dfi 3 dfi* 


Jo* 


6?(1 - fi 4 ) n ( fi 1 fi 2 fi 3 ) n 


• (1 - 0O n (1 - fij) n d - ^[^^{1,2,3! 

- fidM j{ 4, 1, 2} + 0afidfii(3, 4, 1} - fi 2 fia 64 {2, 3, 4)] ^ddfdB, dB i 


-L 


0<«d<6|<«j<fii<x 


fil'd - fidddifiofia) 


m+1 


d - - fi 2 ) n a - fi 3 ) n ](i, 2,3} 


-L 


+/ 

Jq* 


-jL 


‘ / O<0 1 <fl4<fl,<0 2 <* 4/ O<fl 2 <0i<fl4<0a<x % '0<«|<fij<^i<^<3! 

* 3, 2, 1, s; to + 1, ») - fiT^O, 1, y, 3, 2, b; m + 1, n) 

+ 2f "'"(O, 2,1, y, 3, *; to + 1, ») - TS ia ' n (0, 3, 2,1 ,y,m+ 1, ») 

- Tf m,n (y, 3, 2, 1, s; m + 1, ft) - n‘ m ' n (0, 3, y, 2, 1, b; w + 1, ft) 

+ n ;m,n (0, 3, 2, 2 /, 1, s; to + 1, ft) - ro i:mi "(0, 3, 2,1, y; to + 1, ft). 

Using the results of (a, 3, 2,1, b\ to, n ), ( 0 , 3, 5, 2,1, c; to, ft) and ( 0 , 3, 2, h, 1, 

cj to, ft), we have Pr(fii g a;) equal to 

C(4, to, 70(0,4,3, 2,1, a;; to, n) 

TO + ft + 4 


(9) 


2(0, 1, a;; 2m + 5, 2 n + 1)(0, 2, 1, x ; to, ft) 
2(0, 1, a:; 2 to + 4, 2ft + 1) 


[2(0,1, x; 2 to + 2, 2ft + 1) 


(to + ft + 3) 

— (0, X] to + 2, ft + 1)(0,1, a:; to, ft) + (to + 2)(0, 2,1, a;; to, m)] 

+ 2(0, 1, x; 2m + 3,2 n + 1)(0, 2,1, ®; to + 1, ft) 

— (0, a; ?ft + 3, ft + 1)(0, 3, 2,1, a;; to, ft)j>. 

(d) Z <= E, In the evaluation of the distribution of the largest root for l - 5; 
the following parts need to be calculated: 

(a, 4, 3, 2, 1, 6; to, 7i), (a, 4, b , 3, 2, 1, cj to, ft), (a, 4, 3, b, 2,1, c\ to, ft), 

(a, 4, 3, 2, h, 1, c; to, ?i). 
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Proceeding along the lines indicated in the previous sections we get 

p r f/) < x) = m ’ -- f 2(0,1, x;2m -f 7, 2n + 1)(0, 3, 2, 1, *; m, n) 
K l ~ ' (m -f n + 5 ) L 

_ 2(0,1, x; 2m 4- 6, 2 n + 1) ^ x . 2m + 4 _ 2n +1)(0,1, *j m, n) 

(m + n -f 4) 

- 2(0,1, x; 2m 4- 3, 2n + 1)(0,1, x\m + 1, n) 

- (0, a:; m + 3, n + 1)(0, 2, 1, x; m, n) 

,, , 2(0,1, x; 2m + 5, 2n + D 
+ (m + 3)(0,3,2,1, x; m, «)} +- ( m + n + 4) 


( 10 ) 


• ^2(0,1, c; 2m + 5, 2» + 1)(0,1, *; m, ») 

- 2(0,1, x\2m 4- 3, 2n + 1)(0,1, x; m + 2, n) 

_ (0 L gj w + 3,» + l) [2(0 lf v . 2 ,n + 2, 2n + 1) 

[m 4- n + 3) 


(0,#;« + 2,n + l)(0,1 ,*jw,n) 




H“ (tw "H 2)(0,2 t 1, a»j Wj ft)lj 

, - 2(0,3,2, t, x; m 4- 1, n)(0,1, x; 2m + 4, 2n + D 

— (0, x; m + 4, n 4~ 1)(0,4,3,2,1, x", m, . 

It is evident now that the above method can he used to derive the distribution 
for any value of Z. 


5. Distribution of the smallest root. Let Pr[6i g %/p, v] — P(x/p, f) where 
di is the largest root. Let us make the following transformations in the li{l, p, >') 
distribution: 

ft = 1 — 9i 
ft = 1 — f?M 


ft = 1 — 9i ; 

then since 0 < 6 t < 0 ( _i < • • • < 0i < 1, we have 0 < n < n_i < ?v 2 • * • < 
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n < 1, and thus the domain of integration does not change. Hence the joint 
distribution of the r's can be expressed as 

eft ", /*) II M m ’-‘ ri(l- n (r, - ri), 0 <„<■■■ <n <1. 


i -1 


(-1 


t<i 


Thus the r’s have the same distribution as the 9’a, but n and p are inter¬ 
changed. Therefore 

Pr(9i < x) *= Pr( 1 - ri g x) = 1 - Pr(r i g 1 - x) 

= 1 - P( 1 — x/r, m). 

Hence, for getting the distribution of the smallest root, we have to change x 
into 1 — x and interchange m t n in the distributions of the largest roots and sub¬ 
tract the resultant probability from 1. The distributions for the smallest root 
arc given below for l - 2, 3, 4 and 5. 

(i) l - 2. 

Pr(9n < ai) = 1 — Pr(9i g 1 — x/n, m) 

(11) = i - - - - - -- - 4 ( 2 (°’ 1 > 2n + 2m + D 

m + n + 2 


(ii) 1 - 3. 

( 12 ) 


- (0, J - ar, a + 1,7ft + 1)(0,1,1 - a:, ft, m)). 


PM g x) - 1 - l 2 (°> 1,1 ~ x; 2ft + 3, 2m + 1) 

?n -j- n t o 


■ (0,1,1 — jb; n, m) 


— 2(0,1, 1 — x", ft 4- 1,7ft)(0,1, 1 — x, 2ft + 2, 2m + 1) 


- (0, 1 - x-, ft + 2,7?i + 1)(0, 2,1, l - x; n, vi)}. 

(iii) l — 4. 

Pr(0t g x) = 1 - { 2 ( 0 , 1 ,TTTJ, 2 ft + 5, 27ft + 1 ) 

7 ft + n + 4 l 


(0, 2,1 ,1 — a;;n, m) 


( 13 ) 


2(0 * ] d ~ S’ 2u + 4, 2m + 1) 
(7ft + 71 + 3) 


(0, 1 — a:; ft + 2,7ft + 1)(0, 1,1 — m; ft , m) 


[2(0,1, 1 2ti + 2, 2m + 1) 


+ (ft + 2 )( 0 , 2 , 1 , 1 - a:; ?i, m)) 


+ 2(0,1,1 - ®; 2n + 3, 2m + 1)(0, 2, 1 ,1 - *571 + 1, m) 


— (0, 1 — x, n + 3, m + 1)(0, 3, 2, 1 ,1 — a:; ft, 771 ) >. 
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(iv) l = 5. 


Pr(d 5 ^ x) = 1 — 


(7(5, n, m) 
(to + n + 5) 


^2(0, 1, 1 - X] 


2 n + 7, 2m + 1) 


•(0, 3, 2, 1, 1 — *; n, »») 

- 2(Q| 1 ’ - 1 £ - 2 ” 1 - + 1 ■ f 2 (0- 1 > 2 " + 4 .2»i 4- 1) 

(to 4; 

•(0, 1, 1 — x; n, tn ) 

- 2(0,1,1 -^s; 2n + 3, 2 ot + 1)(0,1, F^x; 11 + 1, m) 

- (0,1 — a:; n 4- 3, m + 1)(0, 2, 1, 1 — x\ n, m) 
+ (n + 3)(0, '3, 2, 1, 1 - n, m )) 

, 2(0,1, 1 - a;; 2 n + 5, 2 m + 1) / OM , -- „ , r n , , x 

+ - (m + n+4) -\ 2(0 ’ 111 - 2ft + 5 _^+ 1} 

(0, 1, I — x; n, m) 

-[ 2 ( 0 , 1,1 - x;2n +3, 2m + 1 )( 0 , 1 , 1 - »• n + 2 , m) 

~ '■ *+ 2 . *»+« 

- (0,1 - a;; n -f 2, m + 1)(0,1, 1 — x; n, m) 
+ {n + 2)(0, 2,1 ,1 - x\n, w)]j - 2(0, 3, 2, 1, l ~ 'z; n + 1, m) 

• (0,1, 2n + -1, 2m + 1) 

- (0 ,1 - a;; n + 4, m + 1)(0,4, 3, 2,1, T~~x; n, m) J. 


6. Distribution of any intermediate root, 

(i) I = 3. 

Pr{6 2 £ *) = Pr (0 < 6, < 6, < 0 l <x) -f Pr(0 < 0, < 0 2 < x < 0 ,) 

= °( 3 > m > n )K0, 3, 2, 1, t; m, n) + (0, 3, 2, *, 1; m, »)] 
as thejiwo^probabilities are independent, or 

Pr( 0 2 <x) = 0(3, to, n)[(0, 3,2,1, a; to, n) + (0,3,2,*, 1, s;m, »)], where e = 1 
C( 3, to, n ) , 

_ m -f ?i -j- 31 2 ^’ ,T ’ 2m + 3 > 2,1 + 1 )( 0 , 1 , x; m, n) 

- 2 ( 0 , 1 , x] m + 1 , n)( 0 , 1 , x; 2m + 2 , 2 n + 1 ) 

- ( 0 , *; to + 2 , n + 1 )( 0 , 2 , 1 , 3 ; w, n) 

(15) +[(x, 1 , 2 , m, n)[ 2 ( 0 , 1 , s; 2 to + 3, 2n + 1 ) 

- ( 0 , *; m + 2 , n 4 - 1 )( 0 , 1 , x; m + 1 , n )] 

_ (x,z;m + 2,n+l) !nfn 1 „ 

w-fn-j -2 [2(0,1, x; 2m 4-1, 2n -f- 1 ) 


(0, x\m 4-1 ,n 4- 1)(0,1, ®jn», n)\ 
4 - (X, 1, 2 ; TO + 1, n) (0,1, X ; m, n)(0, x; to -f 2, n 4- 1) 

— 2(x, 1, 2 ; m 4- 1, n)(0, 1, 2 to 4- 2, 2n + 1)\, 
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(ii) l - 4. 

Pfih £ x ) * Pr(0 < 6i < 0 a < 0j < m, n) 

+ Pf (0 < 0< < 0$ < 0 3 < a; < &; m, n) 

* C(4, m, n)[(0, 4, 3, 2,1, a;; m, n) + (0,4,3, 2, x, 1; in, n)} 
and 

Pr(8 , £ x) * Pr(0 < 0* < 0 3 < 0 2 < ft < ®; in, n) 

+ Pr(0 < 04 < ft < 0 2 < x < 8i ; m, n) 

4- Pr(0 < 6i < 0 3 < x < 0 2 < 0i; m, n) 

- C'(4, m, n)[(0,4,3, 2,1, as; m, n) + (0,4,3,2, x, 1; m, n ) 

+ (0) 4, 3, x, 2,1 

The different parts of these probabilities can be evaluated as indicated in sec¬ 
tion 4(d), Thus the method already indicated to obtain the distribution of the 
largest root also gives the distribution of any one of the roots. 

7. Further problems. It is intended to prepare the probability distribution 
tables for small values of l The results obtained in this paper are found to be 
useful in finding the distribution of the sum of the roots when the numbers of 
canonical variates in two sets differ by one. This problem is, however, being 
investigated further. 
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A ft-SAMPLE SLIPPAGE TEST FOR AN EXTREME POPULATION 

By Frederick Hosteller 
Harvard University 

1. Summary. A test is proposed for deciding whether one of k populations 
has slipped to the right of the rest, under the null hypothesis that all populations 
are continuous and identical. The procedure is to pick the sample with the larg¬ 
est observation, and to count the number of observations r in it, which exceed all 
observations of all other samples. If all .samples arc of the same size n, n large, 
the probability of getting r or more such observations, when the mill hypothe- U 
is true, is about li~ r 

Some remarks are made about kinds of emus in testing hypotheses. 


2. Introduction. The purpose of this paper is to describe a significance, test 
connected with a statistical question called by the present author “the pro!dent 
of the greatest one.” Suppose there are several continuous populations/(/ - «p, 
f(x — ai), ■ , f(x — an)j which are identical except for rigid translations or 
slippages. Suppose further that the form of the populations and the value** of 
the a t are unknown. Then on the basis of samples from the k populations w«* 
may wish to test the hypothesis that some population has slipped further to the 
right, say, than any other. In other words, we may ask whether there exists an 


ai > max («i, at , • • • , a,_i, a i4 i, • ■ • , at). From the point of view of testing 
hypotheses, the existence of such an a, is taken to be the alternative hypothesis, 
A significance test will depend also on the null hypothesis. We shall take as t he 

null hypothesis the assumption that all the aV are equal: « v - a* ..„ , 

Using these assumptions it is possible to obtain parameter-free signifieumv 
tests that some population has a larger location parameter (mean, median, quan¬ 
tile, say) than any of the other populations. 

The problem of the greatest one is of considerable practical importance. 
Among several processes, techniques, or therapies of approximately equal cunt, 
we o ten wish to pick out the best one as measured by Home eharacterMic. 
Furthermore, we often wish to make a test of the significance of one of the 
methods against the others after noticing that on the basis of the sample values 
a particular me hod seems to be best. The test provided in this paper allows 

ZZlTd? '!7°T U f the dat “ beforp “W’ 1 ™ “» l " f “WWm-. 

- A,,, P . P l d t ! h f? the adYMda S e of being rapid and easy to apply. How- 
i, t e test is probably not very powerful, and in the form presented here the 

„ vorv 18 to toe technique, but Sure 

ease u'too™ ,TL ’ I ' '“oh for the uuequalUBmplo 

ZtZVX& 1 *" to »ve the fauuL 
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3. The test, Suppose we have k samples of size n each. It is desired to 
test the alternative hypothesis that one of the populations, from which the 
samples were drawn, lias teen rigidly translated to the right relative to the re¬ 
maining populations. The null hypothesis is that all the populations have the 
same location parameter. 

The lest consists in arranging the observations in all the samples from greatest 
to least, and observing for the sample with the largest observation, the number 
of observations r which exceed all the observations in the k — 1 other samples. 
If r > r« we accept the hypothesis that the population whose sample contains the 
largest observation has slipped to the right of the rest and reject the null hypoth¬ 
esis that all the populations are identical; instead we accept the hypothesis 
that the sample with the largest observation came from the population with the 
rightmost location parameter. If r < n , we accept the null hypothesis. 

The statements just made are not quite usual for accepting and rejecting 
hypotheses. Classically one would merely accept or reject the hypothesis that 
the. at are all equal. The statements just made seem preferable for the present 
purpose. 

Example. The following data arranged from least to greatest indicate the 
difference in log reaction times of an individual and a control group to three 
types of wordB on a word-association test. The differences in log reaction 
times have been multiplied by 100 for convenience. Longer reaction times for 
the individual are positive, shorter ones are negative. Does one type of word 
require a shorter reaction time for the individual relative to the control group 
than any other? 


Concrete 

Abstract 

Emotional 
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-16 
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-G 

-11 

-5 

-5 

-3 

-3 

-5 

-2 

-2 

-4 
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-1 
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-1 

-1 

1 

0 
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3 
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3 

1 

12 

9 

8 

13 

11 

10 

13 

12 

16 

15 

29 

20 

28 


Here we have 1c = 3 samples of size n = 14 each! We note that the Abstract 
column has the most negative deviation, —16, and that there are two observa¬ 
tions in that column which are less than all the observations in the other col¬ 
umns. Consequently r = 2. Under the null hypothesis the probability of ob- 
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taining 2 or more observations in one column less than all the observations in 
the others is about ,33, so the null hypothesis is not rejected. 


4. Derivation of test. Suppose we have k samples of size n, all drawn from 
the same continuous distribution function /(x). Arranging observation* within 
samples in order of magnitude the samples 0,- are: 0i: Xn ,£«,•*« , xj, ; 0% : 
X 21 , X 22 j ' ■ , X2n J ' * ■ J Ok • X U , X^2 , ' ‘ ’ 1 %kn ■ 

If we consider some one sample Oi , separately, we can inquire about the 
probability that exactly r of its observations are greater than the greatest ob¬ 
servation in the other k — 1 samples. 

The total number of arrangements of the kn observations is 

(1) t = M_ ! 

V ; (nl)* 

The number of ways of getting all n observations of 0; to be greater than all 
observations in the remaining samples is 


( 2 ) 


N(n) = 


[(k - l)n] ! 


The number of ways of getting exactly n - 1 observations of 0< greater than 
all observations in the remaining samples is 

(3) N(n - 1) = ~ !)rc + ll ! _ [{k - l)n]l 

(rri)*~ l l! (n!)"*“>0! ’ 

More generally, the number of ways of getting exactly r » n - u of O t to be 
greater than all other observations in the remaining samples is 


(4) N(n - w) = Uj - l)n + tt]i __ [jfc - l)n + u - ljl 

(n!) k - l ul (n !)*-!(« - 1)1 


Therefore the number of ways of getting a run of r 
in 0, greater than the rest is just 


n — nor more observations 


(5) 


S(n 


«) = T, 

t*»n—u 


N(i) = Ufe ~~ 1 )n + 1 

(n!)*~Htl 


However we do not choose our sample 0 , at random or proossign it as the 
demonstration has thus far supposed. Instead we choose tl at oTwhich ll 

: rtf 1 1 a v h v amples ' This conditi ° n ^ ustomut 

wftb rtTi h factor h Consequently the probability that the sample 


(6) 


P(r) = ^( r ) _ Hn 1) {kn — r) ! 
T ( kn )! ~(n — r)! 
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As an incidental check we note in passing that 


pa) B • ' { ^ n ~ Ini 

(An)! (n — 1)! kn 

We ante that equation bi) may be rewritten jus 
(7; P(r) - hOtr/Cl", 


1 . 


which is a useful form for some computations. 

Table I gives tlm probability of observing r or more observations in the 
sample with the largest observation, among k samples of size n, which are more 
extreme in a preassigned direction tluin any of the observations in the remaining 
k “ 1 samples. 


6. Approximations. If we use Stirling’s formula and approximations for 
(1 + «) r . for small values of a and r, we can write an approximation for equation 
(6) for large values of n with r and k fixed as follows 


( 8 ) 


f(r) ~ ’ (j - - m 

w kr-*\ 2 kn ) 


For very large n equation (8) yields 


(9) 


P(r) 


-r^i 


hr- 1 ’ 


which is the value given in Table I for n = «> For many purposes the result 
given by equation (6) is quite adequate, as a glance at Table I will indicate. 


6. Kinds of errors. In tests such as the one being considered here the classical 
two kinds of errors arc not quite adequate to describe the situation. 

As usual wo may make the errors of 

I) rejecting the null hypothesis when it is true, 

II) accepting the null hypothesis when it is false. 

But there is a third kind of error which is of interest because the present test of 
significance is tied up closely with the idea of making a correct decision about 
which distribution function has slipped furthest to the right. We may make 
the error of 

III) correctly rejecting the null hypothesis for the wrong reason. 

In other words it is possible for the null hypothesis to bo false. It is also pos¬ 
sible to reject the null hypothesis because some sample 0< has too many ob¬ 
servations which are greater than all observations in the other samples. But 
the population from which some other sample say 0, is drawn is in fact the right¬ 
most population In this case we have committed an error of the third kind. 

When we come to the power of the test under consideration we shall compute 
the probability that we reject the null hypothesis because the rightmost popula¬ 
tion yields a sample with too many large observations. Thus by the power of 
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this test we shall mean the probability of both correct rejection and correct 
choice of rightmost population, when it exists. 

Errors of the third kind happen in conventional tests of differences of means, 
but they are usually not considered although their existence is probably recog¬ 
nized. It seems to the author that there may be several reasons for this among 
which arc 1) a preoccupation on the part of mathematical statisticians with the 
formal Questions of acceptance and rejection of null hypotheses without ade¬ 
quate consideration of the implications of the error of the third kind for the 
practical experimenter, 2) the rarity with which an error of the third kind arises 
in the usual tests of significance. • 

In passing we note further that it is possible in the present problem for both 
(he null hypothesis and the alternative hypothesis to be false when k > 2. This 
may happen when there are, say, two identical rightmost populations, and the 
remaining populations are shifted to the lefl. An examination of Table I will 
give us an idea of what will happen in such a case. If fc = 4, we use r = 3 as 
about the .05 level. If two of the populations are slipped very far to the left, 
while the rightmost two populations are identical, in effect k ~ 2. In this case 
the probability of rejecting the null hypothesis is around .2. Consequently we 
accept the null hypothesis about 80 per cent of the time, and reject it 20 per cent 
of the time under these conditions. But neither hypothesis was true. 

If wc carry the discussion to its ultimate conclusion we would need a fourth 
kind of error for these troublesome situations, There are still other kinds of 
errors which will not be considered here. 

7. The power of the test. It is difficult to discuss the power of a non-para- 
rncitric test, but in the present, case it may be worthwhile to give an example or 
two. The reader will understand that although the test is called non-parametnc, 
its power does depend on the distribution function. 

In the case of k samples there are two extremes which might be considered for 
any particular form of distribution function. In Case 1, we suppose that 
when the alternative hypothesis is true, fc — 1 of the populations are identical 
with distribution function /(x), while the remaining distribution function is 
j(x — a), a > 0. Case 1 may be regarded as a bvaer hound to the power of the 
test because for any fixed distance a between the location parameters of the 
rightmost population and the next rightmost population, Case 1 gives the least 
chance of detecting the falsity of the null hypothesis. 

In Case 2, wo suppose that the rightmost population is /(x — a), a > 0 as 
before, that the next rightmost population is f(x) , and that the other fc - 2 
populations have slipped so far to the left that they make no contribution to 
problem of the power. This is an optimistic approach to the power because it 
gives an upper hound to the power. When fc = 2, Case 1 and Case 2 are identical, 
and the power is exactly the power of the test for the particular distribution func¬ 
tion under consideration. 

Case 3 which we shall not consider deals with the situation where there is more 
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than one rightmost population, but the null hypothesis is false. It is connected 
with the fourth kind of error mentioned at the end of section 6, 

Table II gives the upper and lower bound of the power of Ihe test for k = 3, 
f = 3 , n = 3, when the distribution is uniform and of length unity. 1 he parame¬ 
ter a is the distance between the location parameter of the rightmost distribu¬ 
tion and that of the next rightmost distribution. 

In Table III we give some points on the upper and lower bounds of tho power 
of the test for the normal distribution with unit standard doviation. L he param¬ 
eter a is the distance between the mean of the rightmost normal distribution and 
the next rightmost, measured in standard deviations Again we use the case 
k = 3, r = 3, n - 3. 


TABLE II 


Power p of the test for the uniform distribution when Jo = 8, r — 8, n — 3. The 
distance between the midpoints of the two rightmost distributions is a 
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TABLE III 


Power p of the test for the unit normal when lc = 3, r = 3, n = 3. The distance 
between the means of the two rightmost distributions, measured in standard 

deviations, is a 
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The power of the test has been defined as the probability of correctly rejecting 
the null hypothesis and finding the sample from the rightmost population to be 
the extreme one. This raises a question about the meaning of the entries in 
Tables II and III under a = 0, When a = 0 there is no way to rojoct the null 
hypothesis correctly. The probabilities given are tho probabilities that a 
randomly chosen sample will force a rejection of the null hypothesis. They 
represent the limit of the power function as a tends to zero. If we think of ear¬ 
marking the sample from the lightmost population and of computing the prob¬ 
ability repeatedly that that sample will have three observations larger than all the 
observations m the other sample, and then we let a tend to zero, this is the result 
we get. These values are not the significance levels The significance level is 
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8. Discussion. The reader may rightly feel that the solution here presented 

to the problem of the greatest one depends on a trick. That is, it depends 
intimately on the choice of the null hypothesis. Furthermoic the reader may 
feel that the choice of ai = a 2 = • = a*, is neither an interesting null hypoth¬ 

esis nor one which is likely to arise in a practical situation. The author has 
no quarrel with this attitude. This means that there are many other approaches 
to this problem which are worth trying. The equal-loeation-paramcter case is 
one which yields easily to non-paramclrie methods. 

It will be noted that a useful technique has been indicated which allows one 
to examine the data before making the significance test. In general one may 
wish to set up a test function, decide which of several samples provides the ex¬ 
treme value of the function, and then test significance given that we have chosen 
that sample which maximizes the function among the k samples under con¬ 
sideration 

9. Conclusion. There is a large class of problems grouped around “the prob¬ 
lem of the greatest one”. First it would be useful to have a more powerful test 
than the one here proposed Second, (.here is the problem of deciding on the 
basis of samples whether we have successfully predicted the order of the location 
parameters of several populations Third, there is the general problem of what, 
alternatives, what null hypotheses, and what test functions to use in (renting 
samples from more than two populations. It is to be hoped that more material 
on these problems will appear, because answers to these questions are urgently 
needed in practical problems. 



ON THE UNIQUENESS OF SIMILAR REGIONS 

By Paul G. Hoel 
University of California at Los Angeles 

1. Summary. Conditions are determined for insuring that Neyman’s method 
of constructing similar regions by means of sufficient statistics will yield all such 
regions when such statistics exist. 


2. Introduction. In designing tests of composite hypotheses, one encounters 
the problem of how to construct similar regions and whether the construction 
process yields all possible similar regions. Neyman has derived methods for ob¬ 
taining similar regions when the basic distribution function satisfies certain par¬ 
tial differential equations [I] and also when a sufficient set of statistics exists for 
the unknown parameters [2]. In the former case, the construction process gave 
all such regions; however the question of whether certain subregions were inde¬ 
pendent of the parameters was left unanswered. In the latter case, the indepen¬ 
dence was obvious, but the question of uniqueness was not considered. In 
obtaining sufficient conditions for the existence of a type B region, Schoff6 [3] 
employed Neyman’s differential equations assumptions and methods and demon¬ 
strated that the subregions were independent of the parameters. 

The method of constructing similar regions by means of sufficient statistics is 
much simpler to demonstrate than is the method based on differential equations. 
It also has the advantage that the independence of the subregions requires no 
proof. It possesses the disadvantage that the question of uniqueness is not 
answered. This question can be answered by showing that the assumption of a 
sufficient set of statistics includes the differential equations assumption and then 
employing methods based on the latter assumption. Such a procedure would 
deprive the sufficiency method of its simplicity, consequently a relatively simple 
direct proof of uniqueness has been constructed. The method of proof also shows 
the equivalence of the two methods of constructing similar regions. 


3. Sufficient conditions for uniqueness. Consider a distribution function, 
■ ) 9„), of the variable x that depends upon the v parameters 0, , ■ • . , 
6, . Let Xi, Xt, • • ■ ,x n denote a random sample from this distribution and let 

m J ■ ■ ■ , Z «|01 , ■ • ■ , 0,) denote the distribution function of such a sample. It 
will be assumed that n > v. 


Suppose there exists a sufficient set of statistics 
• , x„) with respect to the parameters 6i, • • • 
shown that if the T’s are continuous and if /(x^ f .. 
/(*1i "•. 9>) must be a function of the form 


T\(xx) • • • , £ n ), ‘ , 

6 ,. Koopman [4] has 
■ , 9,) is analytic, then 
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where the 0j, and 9 are single-valued analytic functions of the 0’s only, and the 
Xk and X are single-valued analytic functions of x only. He has also shown that 
if n assumes its smallest possible value, then 

(2) SliW = V k {Ti , • • •, T,), 

where the V’b are single-valued functions of the T* s. If the preceding conditions 
are satisfied, it follows from (1) and (2) that 

r fi n 

(3) /( Xi, ■ • ■ , x„ I 01 , • • •, 9,) = exp 2 ©i v k + n0 + £ X(x { ) 

L i i-i 

Now it is known [2] that if the T’s possess continuous partial derivatives and 
are such that it is possible to introduce additional functions TVh , • ■ ■ , T n which 
will make the transformation 

Ti = l\(xi ,••■,£„) 


(4) 


T n - T n {xi, , a„) 

one-to-one, then /(x x , • • • , x n \6i, • • • , 0,) can be written in the form 

, c . /(® 1 » • • • . , ■ ■ •, e,) 

“ fii'I’i i ''' » T r |0i, • • - , 0,)ft(xi , • • ■ , x n | T \, • • • , 2\), 

where/i is the distribution function"of this T’s and ja is the conditional distribution 
function of the e’s for fixed values of the T’s. The function ft does not depend 
upon any of the parameters di ,■■■, 8,. 

For the purpose of constructing similar regions, it is desirable to work with/!. 
By combining (3) and (5), fi may be expressed in the form 

Ft + n0 + hJ, 

where II = 2X (xf) — log /2 can be expressed as a function of Tt , ■ • ■ , T, only, 
and where it is assumed that / 2 > 0. 

The method employed by Neyman to obtain a similar region of size a is to 
build it up as the locus of subregions of size a on the “surfaces” obtained by giving 
the T’s constant values. Since the size of such a subregion is obtained by inte¬ 
grating ft over the subregion, it will depend only upon the T’s ; consequently a 
subregion can be selected that will be of size a for every set of values of the T’s. 

Now consider the construction of a similar region of size a by building up the 
region as the locus of subregions of varying size rather than of constant size on 
the surfaces that are obtained by giving the T’s constant values. Let Wi and w 2 
be two regions of size a and let ai(Ti, • • • , T,) and aj(Ti, • • • , T„) denote the 


( 6 ) 


A(Tx 


, T, 10i, ■ • •, 0,) = exp 
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sizes of the surface subregions. It will be assumed that the regions under con¬ 
sideration are such that a\ and a 2 are obtainable from integrating /j over the sub¬ 
region common to wi and w 2 respectively and the surface determined by fixing 
the values of the T’s. The problem then is to determine whether two different 
functions, ai and & 2 , can yield similar regions of size a . 

Since a critical region can be obtained as the locus of subregions, «i and a-j will 
yield similar regions of size a only if 


(7) J J 

U - 1 , 2 ), 

where the integration extends over the range of values of the T’b. By means of 
(6), condition (7) may be written as 

(8) J ■■■ J a > ex P [E GJT + ne + H dTi • • • dT, = a (j * 1,2). 

If e nS is factored out, it is clear that condition (8) will hold only if 


( 9 ) 


/-/ 


m exp 


23 0/t Tic H 


L i 


dTi 


dT, 


~ J J “ s ex P £<QicV k + dT, 

is an identity in the 0’s, and hence in the 9* for the region in the 0* space that 
corresponds to the region in the parameter space for which the parameters 0,, 
• ’ • , 8, are defined. 

Now assume that n = v and that the transformation 

Vi -Wx,...,!-,) 


( 10 ) 


7, = Wi, •••,?’,) 

is one-to-one. From the preceding assumptions that gave rise to (2) and (4), it 
may be shown that the 7’s are continuous and possess continuous partial deriva¬ 
tives. In terms of the 7’s, (9) may therefore be written as 


(ID 


/ ‘ / ex]P [S Qk 7*] Jfj dVx • ■ • dV, 

= /••• J exp 9*7*,^ Ki d7i • • • dV, t 
where K< = a,e ff has been expressed in terms of the 7’s. 

tinn nf e tb he Paramete + rs be dcfined over Nervals andG^is an analytic func¬ 
tion of those parameters, to every region in the parameter space determined by 



UNIQUENESS OF SIMILAR REGIONS 


69 


intervals of the 8’s there will correspond an interval for 0*, throughout which 0* 
will be defined; consequently (11) will be an identity m the 0* for intervals of 
values. For every point within regions determined by Qk intervals, the partial 
derivatives of the two sides of (11) must therefore be equal, provided the deriva¬ 
tives exist and provided the 0; are functionally independent. 

If the conditions to be imposed shortly are satisfied, it can easily be shown that 
if is permissible to differentiate (11) repeatedly under the integral signs with re¬ 
spect to the 0/s. As a consequence, (11) implies that for all sets of non-negative 
integers ki , ■ ■ • , h ,, 


( 12 ) 


I .. f 70...Ff exp ZeblAtfr dV, 


dV 


= / ••• J ri 1 ... Vy’ exp t,Q k V h K 


dr 1 ■ • • dV, 


will be an identity in the 6 k for almost all values of the 0/ ; . But (12) is equivalent 
to requiring that 


(13) 


/ ... J yji Ft'^Fx, ...,F,) dVi ••• dV, 

= /•••/ Fi 1 •• 75^(71, ■■•,Vr)d7i---d7» 


shall hold for all sets of non-negative integers /q , • • • , /c,, where gi and (]•> are 
the integrands of (11) after they have been divided by the function of the Qk ob¬ 
tained from integrating (11). Since g\ and gt will then be non-negative functions 
of the F’s whose integrals over all values of the F's is one, they are distribution 
functions of the F’s. If f/i and gt possess moments of all orders and are such that 
they are uniquely determined by their moments, then condition (13) implies that 

(14) 0i(Fi, • «* , F„) = 0i(7i, •• • , V r ). 

This identity will hold for almost all values of the parameters. If the conditions 
necessary to justify (14) are satisfied, it therefore follows that 

, • •• , T,) = ca(Ti, ••• , T,), 

and that Neyman’s method of constructing similar regions by choosing 
a(Ti , • • • , T P ) <= a yields all possible similar regions of the class of regions 
being considered. 

The conditions that were imposed on /(a:|0i, • • • , 0,) in order to establish 
uniqueness may be summarized as follows: The distribution function 
/(s|0i, ■ • • , 6 r ) is analytic and possesses a set of sufficient statistics, Ti, • • ■ , T, , 
with respect to the parameters d \, • • ■ , 6 ,, that are continuous and possess con¬ 
tinuous partial derivatives. There exist one-to-one transformations of the types 
(4) and (10). The function ce se * y * +zr , treated as a distribution function of the 
F’s, possesses moments of all orders and is uniquely determined by its moments. 
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Finally, the 0* are functionally independent with the smallest possible value of 
n equal to v. 

If the assumption that the Gl are independent is not realized, the distribution 
function (1) could be expressed in terms of fewer than v parameters. This is 
also true if n < v. The two assumptions that n = v and that the 0* are indepen¬ 
dent will therefore be satisfied if (1) is expressed in terms of the minimum number 
of parameters. The remaining assumptions can often be checked quite easily 
whenever a particular distribution function is given. 

In deriving tests of hypotheses for certain parameters, the distribution function 
• , 0,) will of course contain those parameters in addition to the param¬ 
eters 0i, • • • , 0,, but since they will have fixed values, it was not necessary to 
introduce them into the discussion. 


4. Equivalence of methods. Although the equivalence of the two methods 
of constructing similar regions has been implied in the literature [1], no simple 
demonstration seems to be available. Such a demonstration is easily given by 
means of (3). Let 




d log / 

ae ,• ’ 


where / is given by (3) with n = v, and let 

Differentiation of (3) yields 


dip, 

w " a*‘ 


_ V ^ v l_ °G 


(15) 






a 2 e* 


i dd{ d6i 


F* + n 


a 2 e 

< 50 ; 60 / 


The differential equations that are assumed to hold in the other method of con« 
struction [1] may be written in the form 


(16) 


+ 2 -B 


r-1 


Ur<Pr, 


(ij = 1 ,... , v ) f 


where the A tj and B i)r are functions of the 0’s only. Upon substituting the 
values given by (15), it will be found that (16) will be satisfied if 


(17) 

and 


3 Qfc = y r 

Wide, Diir W, ’ 


(k = 1, * * • , v) 
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Since (17) represents a set of v equations in the B./s, whose coefficient matrix is 
non-singular because of the functional independence of the 0*, it follows that 
sets of A’s and B 's can be found to satisfy equations (16). This shows that the 
sufficiency assumption includes the differential equations assumption. 

Now the method of constructing similar regions here consists in building them 
up as the locus of subregions of size a on the surfaces obtained by giving the ip t 
constant values. But from (15) it follows that the surface <p t = c,(t = 1, • • ■, v) 
is equivalent to the surface 



which may be written in the form 




(i 


( 18 ) 



(* = 1»*-' i >»)* 


because 0 is a function of the parameters only, Since the coefficient matrix of the 
V’s in (18) is nonsingular, (18) may be solved for the V’s, consequently the sur¬ 
face ip t = c,, [i = 1, • •, v) is equivalent to the surface F, = c ", (i = 1, ,i>), 

But from the assumption concerning the transformation (10), the surface 
Vx = c, , [% - 1, ■ • *, v) is equivalent to the surface T, = c ", (i = 1, ■ ■ •, v). 
Thus, the two surfaces p, * c, (i = 1, •, v) and T, = c [", ({= 1, • • •, v) are 

equivalent and hence the two methods of constructing similar regions aie 
equivalent, 
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NOTES 

This section is devoted to brief research and expository articles and other short items, 


CONVERGENCE OF DISTRIBUTIONS 

By Herbert Robbins 


University of North Carolina 
Let /„(») (n = 0, 1, 2, ■••) be frequency functions 

j 

(1) /«(*) >0, f f„(x) dx = 1. 

J-PO 

There are various ways m which the sequence of distributions corresponding to 
the J n (x) (n = 1, 2, ■ ) maybe said to converge to the distribution correspond¬ 
ing to f a (x). The definition customarily adopted in mathematical statistics 
(see e.g. [1]) is equivalent to the condition 

(a) lim f f n (x) dx = [ f 0 (x) dx for every f. 1 

n-*oo J—oo 

We shall also consider the two further conditions 

(b) lim / f„(x) dx - / / 0 (x) dx for every Borel set S. 

n -4oo Js Js 

and 


(c) lim / f n (x) dx = j /„( x) dx uniformly for all Borel sets S, 

n-»oo Jb 

It is clear that (c) implies (b) and that (b) implies (a). That the converse 
implications do not hold is shown by the following examples. 

Example 1. Let fo(x) = 1 for 0 < x < 1 and 0 elsewhere. Choose and fix 
any 0 < e < 1, set h = e/n-2", and for n = 1, 2, • ■ • let f n {x) = l/n-« n for 
t/n, — 8„ < x < ijn (i = 1, 2, • • • , a) and 0 elsewhere. If we denote by S„ 
the set of all x for which f„(x) > 0 it is easy to sec that for ft = 1,2, •>• 

(2) 0 < dx — J f n (x) dx < 1/n for every £, 

® f /•(*) dx = 6/2", f /„ (x) dx = 1. 

J S„ 


1 From a well kown theorem of P61ya the convergence is then necessarily unifomfor all £. 
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Hence for the Borel set 5 = XI S n it follows that 


(4) 

[ f 0 (x) dx [ /o(a) dx = «, 

J s i Js„ 


(5) 

f f n (x) dx = [ f n (x) dx = 1, 

Js Js n 

(ft = 1,2, ...). 


From (2) we see that (a) holds (uniformly for all |), and from (4) and (5) that 
(b) fails about as badly as possible. 

This construction can be modified to apply to any f a (x) ; thus choosing fo(x) = 
(2ve ) _1/a we can construct /„(.x) (ft = 1, 2,- ■ ■) and a Borel set S such that 

rt J r E 

lim / J„(x) ch = —/==■ I e -1 /2 dx uniformly for all £, 

n— J— co "\/ 2i1C J— oo 

while 

j s dx = -01, <fcc = 1, (ft = 1, 2, ■ • •)• 

It is conceivable that some time a statistician, failing to consider such a possibil¬ 
ity, will be led to approximate .01 by 1. 

If X n is a random variable with frequency function /„(*), if y - g(x) is a Borel 
function, and if (a) holds, then it follows from Example 1 that the distribution 
function H n (y) of Y n - g(X„), equal to the integral of/„(a;) over the set S„ of all 
x such that g{x) < y, need not converge to the distribution fimction H 0 (y) of 
Y 0 = g(X„). It is easily seen that this possibility is excluded if, as commonly 
occurs in applications, g(x) is such that for every y, the intersection of S v with 
any finite interval is the sum of a finite number of intervals (e.g., if g(x) = sin x). 

Example 2. Let fo(x) be defined as in the previous example, and for n = 
1, 2, • ■ let J n (x) = 1 + sin (2imx) for 0 < x < 1 and 0 elsewhere. By the 
Riemann-Lebesgue theorem it follows that (b) holds. But let S n denote the 
set of all x for which /„(.->:) > 1; then 

[ Jo(x) dx = |, f J n {x) dx = § + 1/tt, (ft = 1, 2, ■ • •), 

so that (c) does not hold. 

It follows from these examples that (a), (b), and (c) are successively stronger 
definitions of convergence. We shall now give some definitions equivalent to 
(b) and (c). 

First we recall that the non-negative, completely additive, and absolutely con¬ 
tinuous set functions 

PM = f fjx) dx , 

J s 


( 6 ) 


(ft = 1,2,.-.), 
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are said to be uniformly absolutely continuous if for every e > 0 there exists a 
S > 0 such that for any S and any n = 1, 2, ••• , 

(7) m(S) < S implies P„(S) < e. 

We shall denote the condition that the P„(S) be uniformly absolutely continuous 
by (u.a c), and we shall now prove that (b) is equivalent to 

(b') (a) and (u.a.c.). 

Proof. (A) Suppose (b) holds. It is clear that (a) holds, and we shall show 
by contradiction that (u.a.c.) holds also. For if not then there would exist an 
e > 0 such that for any ij > Owe could find a set S and an integer n such that 

(8) m(S) < v , P n (S ) > e. 

Moreover, since the set function 


Po(S) = J Mm) dx 

is absolutely continuous, there exists a S > 0 such that 
(9) to(S) < 5 implies P 0 (S) < e/2. 


Now by (8) there exists an S x with m(Si) < S/2 and a h such that P kl (Si) > e. 
Next, there exists an & with m(S 2 ) < S/2 2 and a h such that P h (S 2 ) > c, and 
it is easy to see that we may assume that > ki . Proceeding in this way we 
find a sequence of integers &<*»<••. and of sets Si f St , ■ • • such that 


(10) < */2", PM >e, (n = 1, 2, • •>), 

Let S = Sr 8 ,; then by (10), m(S) < Z?m(S n ) < S, so that by (9), 

( U > Po(S) < 4/2. 

But by (10), 


( 12 ) 


Pk(S) > P k ,(S„) > c, 


(n = 1, 2, • • •). 


wm tapta w C0,I0lui “ ‘ ta ‘ (b) doea ll0ld ' which i8 ‘ co “ tad ““- 

(^ Suppose (bO holds. We shall show first that (b) holds for any set ,S% 

^ ^ there 


^ 13) m W < 5 im P lie s P n(<S) < e/8 (n = o, 1, 2, •..). 

It is known from the theory of measure that corresponding to Si and to 5 we can 

“ 1 & 13 the ““ o' » Me number „? disjoint ktUs suIS 

< 14 ) m((S l 


®s) + (Sj — Si)) < S. 
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From (13), (14), and the relations 

(15) P n (Si) = P n (S 2 ) + PMi - &) - PM S'a - Si), (n = 0, 1, 2, ■ ■ •), 
it follows that 

i Fo(Si) - P»(&) | < | P # G&) - P„(&) | + P„CSh - &) + P.(& - SO 

+ Po(& - S } ) + PoiS, - SO < | P„($) - P„(£ 2 ) | + e/2, 

and from (a) that for large enough n, 

(17) | Po(Sj) - P„(&) | < e/2 

Thus from (16) and (17) it follows that for large enough n, 

I Po(Si) - Pn(Sl) | < «, 

which proves (b) for the case m(S) < °°. 

Now given any e > 0 choose a, /3 so that, setting A = ! a < x < /31, we have 

(19) P„(A) > 1 - «/4. 

Then it follows from (a) that for large enough n, 

(20) P n {A) > 1 - «/2. 

Then for any Borel set S we have for large enough n, 

P„(S) - PM) = PMA) + PM - A) - P(SA) - P(S - A), 

| PM) - PM) I < I Pn(SA) - Po(SA) \ + PM - A) + PM - A) 

< | PMA) - Po(SA) I + e/2 + e/4. 

But by the previous case, since m(SA) < =», for large enough n we shall havo 
| PMA) — Po(*SA) | < e/4. Hence for large enough n, 

| PM) - PM) | < e, 

so that (b) holds in this case also. This completes the proof. 

We shall say that lim /„(:r) = / 0 (») in measure if for every e > 0 and for 

n 

every set A such that m(A) < «, the measure of the set of all x in A for which 
I /n(x) — /o(x) | > «, tends to 0 as n increases. (For a space of finite measure 
this reduces to the usual definition.) We now observe that (c) is equivalent to 

(c') lim/„(x) = fn{x) in measure. 

n— 

In fact, it is easy to show that (c) is equivalent to convergence in the mean of 
order 1, 

(c") lim f \f n {x) - j'M) | dx = 0, 

n—*oo *'—» 
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which implies (c'), and a theoiem of Scheffd [2] states that (c') implies (c).‘ 
Finally, it is not hard to show that the condition 

(d) lim/ n (a:) = fa(x) almost everywhere 

Tl 

implies (o') but not conversely 

Summing up, we arrive at the following complete set of implication relations 
among the various modes of convergence which wo have considered: 

(20) (d) -»(c") ?± (cO & (c) -»(b0 <=* (b) —* fa). 
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ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESS 


Bt Z. W. Birnbaum 
University of Washington 


The quality of a distribution usually referred to as its peakedness has often 
been measured by the fourth moment of the distribution. It is known, however, 
that there is no definite connection between the value of the fourth moment and 
what one may intuitively consider as the amount of peakedness of a distribution 1 
In the present paper a definition of relative peakedness is proposed and it is shown 
that this concept has properties which may make it practically applicable. 

Definition. Let Y and Z he real random variables and Yi and Zi real con¬ 
stants. We shall say that Y is more peaked about Yi than Z about Z t if the in¬ 
equality 


P(| Y - Yi | ^ T) g P() Z - Zi | ^ T) 
is true for all T > 0. 

If, for example, Y and Z are normal random variables with expectations F, 

and Zi and standard deviations <r, and <j§ , and if «r, < «r,, then F is more peaked 

about Fj than Z about Z l . Similarly, if F is a random variable such that 

with pi 7 - \ n/J ~ for ® and if Z is the discrete random variable 

7 fth ~ P Z T = then 7 is more P eaked about i(« + b) than 
a about the same point. 


^ 11111)1168 (C) ’ bUt the Lebes S ue convergence theorem on 
,r ms proof is based holds for convergence in measure (see e g. [3]). 

(1945), pS! 8 ^ C ° mm0tt 6tr0r C ° ncernin 8 kurtoais >” Am, Stat Assn. Jour., Vol, 40 
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Lemma Let Fx , Y %, Zi, Z 2 be continuous random variables 2 with the probability 
densities ^i(Fi), p 2 (F 2 ), / 2 (F 2 ) swefc that 

1°. Fi and F 2 are independent, Z\ and Zi are independent, 

2° p,(F,) = *,(- F.) /or aZZ F,, /,(F,) = /,(- F.) for all F>, (i = 1, 2), 

3°. <p 2 (F 2 ) and fi(Zf) are not-increasing functions for positive values of the vari¬ 
ables, and 

4°. F, is more peaked about 0 than Zi , for i = 1,2. 

Ld Y = Fj + F 2 and F = Z x -f F 2 . Under these assumptions F is more peaked 
about 0 than Z. 

Proof: Let $Xy) = P(F, g ?/), F,(z) = P(F, £ 2 ), for i = 1,2, be the cumula¬ 
tive probability functions. For any random variables Fx, F 2 , Fx, F 2 (not neces¬ 
sarily continuous) which fulfil assumption 1° we have, for any T, the relation¬ 
ships 

P(F g T) - P(Z S T) = [ [$x(P - s)d3> 2 (s) - Fi(T - s)dP 2 (s)] 



= [ IW 

J_oo 

= [ mt 

J—OQ 


- s) - Fi(T - s)]dd? 2 (s) 

+ r F X (T - s)[d<Hs) - dFi(s)] 

J— 00 

- s) - Px(T - s)]d4> 2 (s) 

- f " Ms) - Fi(s))dFi(T - s) 

d—GO 

- s) - Fx(F - s)]d$ 2 (s) 

+ f MT - s) - HT - s)]dPx(s) 

J—eA 


where 


etc. 


= I x(T) + I a (T), 

7x(D = f”Wf - s) - FxCT - 

J—to 

= [ [<D>i( — s) — Px(—s)Jd<I‘ a (T + s) 

d— CO 

-/>r 

= [ ms) - $i(s)]d4> 2 (F - s) 

dO 

+ [*x(-s) -Px(-s)]dW + s)}, 


8 Aa defined e.g. in H. Cramer, Mathematical Methods of Statistics, Princeton University 
Press, 1946, p. 169. 
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If the random variables have distributions symmetrical about zero (assumption 
2 °) this is equal to 

f + " \\P(Zx Si)- P(Yi g s)]dP(F 2 g T - a) 

Jo 

+ [P(F 2 £ — s) - P(Z, g -s)]dP(Yt £ 7’ + *)| 
= f" {[1 - P(Z, >•)-! + P(Fi > s)]dP(F s ST-*) 

Jo 

+ [p(Fi a *) - p(Zi ^ s)]dp(n sr + <)l 
= r {[p(f, as) -p(2i a«)]d[p(r a ^ t + s) + p(Y» sr- *)] 

Jo 

- [P(7i = a) - P(Zi = *)]dP(F* £ T - 5)) > 

and we obtain 

h(T) = f + ”° [P(F, g a) - P(Zi a s)]d[P(F a g T + s) 

( 1 . 1 ) 

+ P(F 2 sr-s)i- 


/ [P(Fj = s) - P(Zi - s)]dP(F s gr-s), 
Jo 


By an analogous argument one derives the equality 


( 1 . 2 ) 


Ii(T) = [ + [P(F 2 fc a) - P(F 2 s)]dP(F l g T + s) 

Jo 

+ P(Z 2 i T - s)] - f + " [P(F» = s) - P(Z 2 = s)]dP(Zi sr-i), 
Jo 


Making use of the assumption that Fi, Y 2 , Z \, Z 2 , are continuous random vari¬ 
ables, we conclude that the second integrals m ( 1 . 1 ) and ( 1 . 2 ) are zero, and we 
may write 


( 2 .D hm = r 

Jo 


[P(Yx as *) - P(Zi ^ s)fc(r + s) - - 8)}d8, 


( 2 . 2 ) J,(D = f [P(F* fc a) T P(F 2 ^ s))[/i(T + s) - / x (r - a)]da. 

•'O 


For T ^ 0 we have, making use of assumption 3°, 

<P 2 (T + s) - **(T - s) g 0 if 0 g s g T 


n(T + s) - <p 2 (P - s) = 9 = 2(3 + T) - 902(3 -r)goifogrgs, 
and similarly 

A(P + s) — fi(T — s) g 0 for all T ^ 0 and s S 0, 

Since according to assumption 4° we also have 
P(Fi ^ s) - P(Zj £ s ) g 0 
P(F 2 ^ s) — P(Z 2 s) g 0 for s ^ 0, 



ON RANDOM VARIABLES WITH COMPARABLE PEAKEDNESB 


79 


both integrands in (2.1) and (2.2) are non-negative for all values of s, and we 
conclude 

P(F ST) - P(Z S T) = IAT) + h{T) £ 0, 

and hence 

(3.1) P(Y 5 T) - P(Z T) S 0 for T ^ 0 

From assumption 2° one easily sees that Y and Z have symmetrical probability 
distributions. This together with (3.1) leads to 

P(F S; T) - P{Z £; T) = P(Y S - T) - P(Z S - T) S 0, 

and thus to 

P(| Y | § T) - P(| Z | fe T) S 0 for T g 0. 

As can be seen from (1.1) and (1 2), the assumptions of the Lemma, in par¬ 
ticular the assumption that all variables are continuous and the assumption 3°, 
are rather special sufficient conditions for Y being more peaked about 0 than Z. 

Theorem 1. Let Y and Z be continuous random variables with ’probability 
densities <p(F) and f(Z) such that 

1°. <p(— Y) = tp(Y) for all Y, /(- Z) = f(Z) for all Z, 

2°. <p{Y) and f{Z) are not-inaeasing functions for positive values of the variables, 
3°. Y is more peaked about 0 than Z. 

Let Yi, Yi, • • • , Y n and Z\, Zz, • • • , Z n be random samples of Y and Z, respec- 

j ” i » 

lively, and Y n = - zZ F,-, Z n = - 2_) Z,.—Then F„ is more peaked about 0 than 
i n 3 _i 

Z*. 

Proof From the preceding Lemma one concludes by simple induction that 
Y' = 7i + Yt + • ■ ■ + F„ as well as Z' = Ft + Z 2 + • • • + Z n are continuous 
random variables with distributions symmetrical about zero and probability 
densities not-increasing for positive values of the variables, such that Y' is more 
peaked about 0 than Z', From this the theorem follows immediately. 

The conjecture that assumption 2° of Theorem 1 might be superfluous is in¬ 
correct as may be seen from the following example: 

Let F be any continuous random variable with a distribution symmetrical 
about zero and such that P(j Y \ > a) — 0 for some a > 0. Let Z be the dis¬ 
crete random variable with P(Z = — a) = P{Z = a) ~ h. We have for 0 £ 
T g a 

P(| F | £ T) £ 1 - P(| F | £ T), 

hence F is more peaked about 0 than Z. If Ft, F 2 and Zi, F 2 are random sam¬ 
ples of Bize 2, we have 

P(2 2 = -a) = P(F 2 = a) = i P(Z 2 = 0) = i 

and thus 

P(| % | 3; T) = | for 0 < T g a 
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The random variable Y 2 is continuous, with a distribution symmetrical about, 
zero, such that P(| ? 2 1 go) = 1. There exists, therefore, a Pi such that 
0 < Ti ^ a and that P(| Yi | ^ Pi) =1 It follows that 

p(| y 2 1 ^ ro = f > i = p(| | a Pi), 

hence ? 2 is not more peaked about zero than Z 2 . The random variable Z in 
discrete, hut it can be approximated by a continuous random variable with a 
TJ-shaped probability density, so that all the probabilities will be modified only 
very sbghtly and ? 2 still will not be more peaked than Z 2 . Nothing will change 
m this example if one assumes that Y fulfils condition 2° of Theorem L 
Theorem 2 Let Y be a continuous random, variable such that 
1°. ip(- Y) = <p(Y) for all Y, 

2°. <p{Y ) is a nol-mcreasing function for Y > 0, 

3°. P(| Y | > a) = 0 for some a > 0. 

1 n 

Let Yi , Yi , ■ ■ • , Y„ be a random sample of size n and Y n = -2 Yi . Then, 

n ,-i 

for any y 2: 0, ive have 

(4.1) p(I ?n | ^ y) £ Q , 

where 


(4.2) 


*„(i) = - 2 

(n/ 2 )((+l)<45n 



5<* + » 



Proof. Let Z be the random variable with uniform distribution in the 
interval — 1 fi Z g 1. If Zi, Z 2 , • • , Z n is a random sample, then Z' => 
Z\ + Z 2 + - - + Z n has the cumulative probability function* 


= 0 , 

P(Z' g«) = i 2 i-\yl n \( z + n 
n\ ,s(,+„)/ 2 

= 1, 




and Z n = - has the cumulative probability function 


z < — n, 
—n S? S#, 
z > n, 


= 0, 


p(z n §r)=-, 

w! 

= 1, 


2 (—xy ( n . 

n\ <s(n/ 2 )(r+i) v \i 


~2 & + « 


f < -1, 
- *] , “1 S f S 1, 
f > 1. 


1 Th 1 ® expression is due to Laplace. For derivation and disoussion, see: J. V. Usnenskv 
Introduction to Mathematical Probability, McGraw-Hill, 1937 , p. 279 , and Cramdr, op. cit!’, 
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Thus, 


P (I Z» I ^ o - 2[1 - P(Z„ g t)] 

-a(i-i £ (-D'Wr? 

( w! ig(7i/2)(i+i) \vl_2 


(t + 1) - t 


and m view of the identity 


E(-l) fc (?J(« ~ *)“ = nl 


this becomes 


p(\ 2, | & t) = - E (-1 )* u 

W (n/SKi+lXiSn \/C; 




for 0 ^ Z g 1. The random variable - is obviously more peaked about zero- 

a 

Y 

than Z. Since - and Z fulfil the assumptions of Theorem 1, it follows that 
a 


■ is more peaked about zero than Z n , that is 


P 


(I 


a 



£ «) - 9,(0 for t £ 0. 


Setting at = y, one obtains (4.1). 

For n —> °o the function \k„(<) approaches asymptotically the probability 
P (| X | lV3n) for the normalized normal random variable X* For n = 8- 
one obtains the following values which indicate a good approximation: 


i 

.3998 

.5264 

.6711 

P(\X\ £ w 24) 

.05 

.01 

.001 

9,(0 

.049 

.0092 

.0005. 


For smaller values of n, 'F„(i) can be easily computed. 


A METHOD FOR OBTAINING RANDOM NUMBERS 

By H. Burke Horton 
Interstate Commerce Commission 

The need for large quantities of random numbers to be used in sample design,, 
subsampling, and other statistical problems is well known. Tippett’s [1] num¬ 
bers have been widely used for these purposes, despite criticism directed at 
their lack of randomness The following procedure may be of interest to those- 


* Cramer, op. oit., p. 245 
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who wish to develop their own random series. The method described below will 
ultimately be used to record extensive tables of random numbers for general use. 

Current methods of producing random numbers usually depend upon single 
operations of mechanical or electronic devices. These may be described as 
“single-stage” random number processes. The numerical results are biased to 
the same extent as the devices from which they are taken. 

At this point it is desirable to describe a process which may be* Called “com¬ 
pound” randomization. Assume two roulette wheels arranged in series so that 
the first controls the arrangement of symbols on the second wheel, while a turn 
of the second wheel determines which of its positions is to lie observed. If the 
decimal system is used, the first wheel would have 10! “equally likely” positions, 
and the second would have 10 "equally likely” positions. If three such wheels 
were to he chained, the first would require (10!)! positions, the second 10! posi¬ 
tions, and the third 10 positions. In general, if >i wheels were to lie chained, 
the first would require 10(0" _1 “equally likely” positions. It is not practical 
to design such a machine. 1 

One method of surmounting these difficulties is to shift to the binary system 
in order to take advantage of the fact that 2! = 2; or, in general, 2(!)" =» 2. 
This property makes feasible the chaining of any number of machines in scries; 
and, furthermore, the machines can be of the same design. If desired, the re¬ 
sults taken from a single machine may he chained, Another important feature 
is the ease of handling binary chains by electronic systems. 

The words “equally likely” have been placed in quotation marks thus far to 
indicate that the probabilities are as nearly equal a8 manufacturing precision 
permits. Any simple single-stage device will have some bias, and it is this very 
lack of true equality that the chaining process is designed to meet. For con¬ 
venience we may take as our binary symbols -}-l and — 1 rather than the custom¬ 
ary 1 and 0. We adhere to the usual rales regarding the sign of a product. 

Let j>i be the probability of obtaining +1 in the i‘ h trial (or in the i is machine 
of a chain of machines). 0 < p, < 1. q t = 1 — p { represents the probability 
of obtaining -1 in the i lh trial. 

Let P, be the probability of obtaining -(-1 as the product of i trials. Q< = 
1 - Pi is the probability of obtaining -1 as the product of i trials. The follow¬ 
ing relationships can be set down immediately: 

Pi - Pi & = Si 

Pt = Pi-Pz + 0i'Sj 0 2 = P 1 .q t + Q 1 .p i 

Pi = IVps + Qrq, 03 = p vqi + Q rpj 


P » = P + &-r?. 0< = Pi-i-q< + 0,_t 


-Vi 


] It has been pointed out by Dr. George W. Brown that a practical solution is possible 
Using any number base, n, by addition of random digits (0,1,2, ... „ - 1) m ^u]o » b 
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We may calculate the bias, Pi — for a chain of fc trials: 

Pk - h = i(P k - Qk) 

= i(Pk-vPk + Qk-i-Qk — Pi-i qk — Qk-i-Vk) 

Factoring, we have 

P k h = h'\P Qh—l)(Pk Qk) 

Substituting for P*_i — Q/ f _i and factoring again, 

Pk i = \{Pk-2 ~ Qk-l) iVk-\ — Qk-l) i'Pk - qk) 

Continuing the process of substituting and factoring, we obtain 
Pk ~ i = i(Pi ~ Qi)(pi — ©)■■■ (pk - Qk) 

(1) i k i k 

Pk - § = n ii (pt - Qi) = n n (2p. -1). 

£ <—1 ^ <—1 

We may write the general formula for Pb : 

(2) Pk - i[l +ft(2p. ~ !)]• 

In the special case where all the %h are equal to a constant, p, 

(3) P k = Ml + (2 p - ifl 

This can also be derived directly by expansion of (p — q) k . 

If any machine, r, in the chain has no bias (p r — 1, exactly), the chain itself 
has no bias, since 2p r — 1=0. Note also that if for all i, 0 < p< < 1, the bias 
of the complete chain is less than the bias of any component (single or multiple) 
taken from the chain, because | (2p, — 1) | < 1, Or stated another way, the 
results taken from any machine, no matter how nearly perfect, can be improved 
by chaining with another machine, no matter how biased the latter. Even in 
the limiting case, p = 1 (or 0), the magnitude of the bias remains unchanged; 
in all other cases it is reduced. The bias of final results can be made as small as 
desired by increasing the length of the chain. Compound randomization can be 
regarded as an attrition process which may be used to reduce final bias below 
any preassigned quantity. If the observations taken from two machines in the 
chain should be perfectly correlated, the only effect is to shorten the chain by 
two. 

In shifting from the binary system to the decimal system, symbol bias will be 
introduced. In general, symbol bias will be introduced in passing from a given 
positional system to any other positional system, unless one of the number bases 
is a rational power of the other. 

To illustrate, let us assume that we have a random binary series and wish to 
obtain a random one-digit decimal serios. It will be necessary to tabulate the 
binary series in blocks of four symbols. The quantities will range from 0000 
(binary) to 1111 (binary), or from 00 (decimal) to 15 (decimal), with equal 
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probabilities. There would be no predominance of either ones or zeros in th e 
overall binary tabulation, as illustrated in the table below. 


Binary System 


Tabulation to this point 


Overall tabulation 


0000 

0001 

0010 

0011 

0100 

0101 

0110 

0111 

1000 

1001 


25 zeros 
15 ones 


1010 

1011 

1100 

1101 

1110 

mi 


32 zeros 
32 ones 


Decimal tiysLeni 

0 

1 

2 
3 
-1 
5 
li 

7 

8 
0 


One of each symbol 


10 

11 

12 

13 

14 

in 


(Right digit only) 
0-5, 2 each 
0-9, 1 each 


However, if we look at the right digit of the decimal tabulation, it is clear that the 
symbols 0 to 5, inclusive, will occur twice as often as the symbols 6 to 9, inclusive. 
The easiest way of correcting for this bias is simply to reject all two-digit decimal 
numbers which occur, thereby giving equal probabilities to the ten decimal sym¬ 
bols. The rejection could be accomplished moat easily by electronic devices 
operating on the binary numbers. All numbers greater than 1001 (binary) 
would be excluded through the operation of a simple four-stage electronic 
counter. 

This simple illustration also demonstrates the inefficiency of converting ran¬ 
dom four-digit binary numbers to random one-digit decimal numbers. 37.5% 
of the data are lost in the process of removing bias. A more efficient procedure 
would be to tabulate the random binary series in blocks of ten digits. The 
largest number that could occur would be 1 111 111 111 (binary), or 1,023 (deci- 
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mal). The numbers would have equal probabilities insofar as this is attainable 
by chaining. To obtain a random three-digit decimal senes it would be neces¬ 
sary to reject the numbers above 999 (decimal). This would amount to only 
2.34% of the available data. As before, rejection could be accomplished easily 
in the binary series by use of a ten-stage electronic counter 

Several promising devices are being considered for tabulating random numbers 
in accordance with the principles discussed herein Electronic or electrical 
systems actuated by cosmic rays seem to be the most desirable. Tabulating 
equipment may be wired to turn out random mimbeis, possibly as a by-product 
of other card runs. 

If only a few random numbers are needed, they can be obtained by much 
simpler methods. For example, a com may be tossed, letting heads and tails 
represent +1 and —1, respectively The product of k successive tosses would 
be tabulated as the random binary variable. Products equal to 4-1 and — 1 
would be coded as 1 and 0, respectively. Blocks of binary symbols would then 
be converted to the decimal system as described above. 

REFERENCE 

[1] TirrETT.L.H. C., Random Sampling Numbers, Tracts for Computers, No. 15, Cambridge 
University Press, 1927. 


NOTE ON THE ERROR IN INTERPOLATION OF A FUNCTION OF TWO 
INDEPENDENT VARIABLES 

By W. M. Kincaid 

University of Michigan 

Suppose that g is a functon of one real variable x and h is an interpolation func¬ 
tion sucli that g(x) = h(x) for x = xy , x %, - • • , x n . Let f(x) = g(x) — h(x) 
d n 

and suppose that ~ n f(x) exists in an interval containing the points x 0 , , • • • , 

x„. Then the error in interpolation may be estimated from the well-known 
relation 

(1) f(x o) = J-~r (»0 - 3i)(.To - .t's) • •• Oco - .r„), 

where £ is some point in the smallest interval containing Xo , aq, • • • , x n . 

In the most usual case, where h(x) is a polynomial of degree less than n, wc 
have /<">(£) = g M (f), 

It is natural to consider the corresponding situation for functions of two inde¬ 
pendent real variables a; and y. Let g and h be two functions such that g{x, y ) = 
hix, y) for n points x = x, , y = y,(i = 1,2, ■ • • , n). Setting fix, y ) = g(x, y) — 
h(x, y) as before, we have /(*,• ,yt) = 0 for i = 1, 2, • • • , n. Then if (x D , yi) 
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is a point at which g and h are defined, we may ask whether there is any formula 
corresponding to (1) from which the error /(xo, yo) can be estimated. 

Some restrictions must be placed upon the function / if any interesting results 
are to he obtained. Let us suppose that f(x, y) can be expanded in a Taylor 
series about each of the points (x<, y t )(i = 0, 1, • • • , n) with a region of con¬ 
vergence sufficient to include all the points of the set. These conditions aro more 
stringent ones than will be required for obtaining the later results; on the other 
hand, they would almost always be satisfied in any practical problem of inter¬ 
polation, so it scarcely seems worthwhile to look for the weakest possible con¬ 
ditions at this point. 

The first case of real interest is n = 3. It follows from the general statement 
of Taylor’s theorem with the remainder that 

0 = f(xi, y t ) = f(x o, y 0 ) + (to - to) f*(to , yo) + (y. - y«)f t (to , Vo) 

(2) + $■[(*, - ®o) a /**(£i, yi) + 2 (xi - to)(y, - i/o)/w(S*. v<) 

+ <y, - yo)7 w fo,»?,)] (i = l, 2,3), 

where (£,, in) is a point on the line segment joining (xo, y 0 ) and (x ;, y<) for i — 
i,2,3. 

The equation (2) may be regarded as a set of three linear equations in the two 
quantities f z (x 0 , y 0 ) and /„(x 0 , y 0 ). The condition that these shall be consistent is 

I( x i i yo) T Ui xi — xo 2/1 ~ 3/o 

(3) f(to , yo) 4- l/j Xa — Xo y a - y 0 = 0, 

Kto , y«) + Us Xj — x 0 7/3 - y 0 

where 

Ui = |[(x< — x 0 ) 2 /rx(?<, n,) + 2(x< - X 0 )(y< - i/o)/xv(?< i v<) + (yi ~ yo)7*/(f<. w)] 

(*=1,2,3). 

If the three points (x<, y<) (i = 1, 2, 3) are not in a straight line, (3) can be 
written in the form 


( 4 ) Kto , yo) = - 


This expression is analogous to (1), though far less simple and elegant in form. 

A similar treatment can evidently be used in all cases of the type rc= m( ' m ij 

2 


U\ xi - xo yi - y 0 

U 2 xj - xo yt — y 0 

U» x 8 - xp yp — y 0 

1 to yt 
1 Xt yt 

1 to y 3 
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For example, for n = 6 the equation corresponding to (4) is 

Vi Xi — x 0 2/i — 2/o (xi — Xof (xi — Xo) (iji — ijo) (lh — 2/oX 

Vi Xi — Xo yi — 2/0 {xi — £o ) 2 (£2 — £0X2/2 — l/a) ( 2 /a — !/o) 2 

Fs £3 — £o 2/a - 2/o (£3 — £o) 2 (£3 - £o) (Vi ~ Vo) (Vs ~ 2/oX 

Vi Xi — Xa yi — 2/o (xi - Xo) 2 - £0X2/4 — 2/o) (2/4 — 2 /o) 2 

Fo - 2/o 2/fi - 2/o (£1 - £o) 2 (£s - £0) ( 2/5 — 2/o) (2/5 — 2/oX 

Fo £« — xo y» — Vo (x t — Xo) 2 (x» — x 0 )(yi — 2 / 0 ) ( 2/0 ~ 2/o) 2 


(5) /(xo, 2 / 0 ) = - 


1 £1 2/i £1 £i2/i 2/i 
1 £2 2/1 £2 £ 22/2 2/i 


1 za 2/3 £3 £32/3 2/a 
1 £4 2/1 £4 £42/4 2/1 
1 £b 2/i £5 £b2/b 2/6 
1 £0 2/6 £0 £02/b 2/1 


where 

F, = £[(£, - £0 YUxib , 3?,) + 8 (*. - £0 )\y<- - 2/o)/**v(£. > »n) 

+3(£< 2 /°Xiu«(?i 1 Vi) 4~ ( 2 /* 2 /o) /i/iXX’ 4?*)] (i = 1, 2, • • • , 6). 
(Equation (5) breaks down only if the six points (.£ 1 , 2 / 1 ) • • • (£« > 2/») he on a 
single conic.) 

As an example of the general ease we may consider n = 4. We write 

/(£.', 2!/<) = /(*o, 2/o) + (£< - £o)/*(£o, 2/o) + (2/. - 2/o)/i /(£0 j 2/o) 

+M(£> £0) /n(?>, 47 >) 4 " 2 (£i £0) (2/1 2/0Xcv(£< > 474) 

+ (2/i — 2/o) 2 /w(£» » ’!•)] (f = X 2, 3, 4). 


Now, 

/tj( 2 ; 1 , Iji) — fxx{Xo , 2 /o) 4 “ (£' £0 )fxxz(fci , 7 ?i) 4 ” (^1 2 /o)/~.n/(£i 1 4 ?i), 
where (X , »n) is a point on the line segment between (x a , 2 / 0 ) and (& , 774 ). 
Proceeding as before yields 


( 6 ) 


/(£o, 2/o) 


TFi £1 - £0 2 /i - 2 /o (£1 - £o ) 2 

TFs £2 - £0 2/2 - 2 /o (£2 - £o ) 2 

TF» £a - £0 2/0 — 2/0 (£3 - £o ) 2 

Wi £< - £0 Vi — 2/0 (£4 - £o) 2 

1 £1 2 /i £1 

1 £2 1/2 £2 

1 £3 2/3 £3 

1 £4 2/4 £4 
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with 

Wi — §[(£. — — Vo)fixi(£t, v<) + (®* — - r ») fa' ~ !/oV**i>(£< > Vt) 

+ 2 fa — x 0 )(y, — yo)J IU (^ > w) + 0/< ~ yoYfuuiki , >»<)]■ 

Corresponding formulas can bo derived in this way for any value of ft; in fact, 
several alternatives may be obtained in each case. In all eases the error/(-T d , ?/n 1 
is given in terms of the derivatives of g alone if a polynomial of a certain type is 
used for the interpolating function. For equation (4), the suitable polynomial 
would be h(x,y) =[a + bx + cy, for (5) ,h(x,y) => a + bx + cy + da; 5 + exy + fy 2 ; 
for (6), h(x, y) = a + bx + cy + dx\ If the interpolating function AO, y) 
is not so chosen, the formulas remain valid, but derivatives of A will appear. 

The same procedure is applicable to functions of any number of independent 
variables. 


ON A LEMMA BY KOLMOGOROFF 

By Kai-Lai Chung 
Princeton University 

The following lemma was proved by Kolmogoroff [1]: 

If d , e 2 , • , e n are independent events and U an arbitrary event, track that 

,(IF(X) denoting the probability of X and W,{X) the conditional probability of X 
under the hypothesis of e) 

W,„(U) g u, W{ei + • • • + e„) g u. 

Then 

W{U) g 

This result seems of some interest in itself and may also have practical applica¬ 
tions, for it is easily seen that [2] in general if ei, e s , • • • , c„ arc arbitrary no 
information about W, l+ +<,„({/) can be obtained from that about 
h = 1 , ■ ■ ■ , n. From this point of view the constant 1/9 is interesting, though 
it is unimportant m KolmogorofFs proof of the law of large numbers. Using his 
original method this constant can easily be improved to 1/8. However, the fol¬ 
lowing method will give a better result. At the same time we shall put it into 
a more general form, 

Let 


E 


fc-i 


W (e*) g /3. 


W. t {U) g a, 



ON A LEMMA BY KOLMOGOROFF 


89 


Then we have for 1 rg /c <; n, 

(1) W(U) ^ Wmti + • ■ ■ a)) = W(Ue 1 + ■ ■ ■ + Ue h ). 

Now a simple case of certain inequalities due to Bonferroni and Frecliet [3] 
states that for arbitray events Ex , ■ • • , E k we have 

(2) W(Ex + T'F(^.) - £ W(E t E,). 

i-l 

Applying this to (1), we obtain 

W(JJ) St E TF(N 6l ) - 2 TF(Ne;e,) 

>-l lSKfii 

St E Trwir^cc/) - E irwir^), 

•-1 1 £< <j$k 

using the independence of c x , • • • , c L . lienee 

W(U) ^ «E w(e<) ~l(ib mc<)) 2 + 5 E W\e t ). 

By Cauchy’s inequality, 

t,W\ e> ) £ l(t W(ei) Y. 

t -1 K \,_1 / 

k 

Writing = 2 TFfct), we have 

t-i 

Now let 0 < 7 < 70 < 1 where 7 and 70 are to be determined later. If there is 
an e,, 1 < i < n such that TF(«») st 7 / 5 , then 

(4) W{TJ) 2 : TF07c t ) = W(e,)W 1{ (U) 7 «/3 

If every TF(e,) < 7 / 3 , we determine l'(> 1) such that 


thus 

And (3) yields 



(5) 
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Now we choose 7 so that the last terms in (4) and (5) be equal. This gives 

2a - (l - |) 

= _^- ... — . ■V* 


7 = 


2a + 


H) 


78/3 


(ly 

To maximize 7 , we put — = 0 and find 
“7o 


7o 


_ 2(V2 - l)a 

P 


If 2(V2 — l)a ^ 3, this choice of 70 is admissible, and we obtain 

2-V2 + i(V2-l) 2(v5 _ 1)a 


7 = 


V 2 - i (V 2 - 1 ) 


Thus we get (the first inequality being retained for small values of ») 

2 - V2 + - (V2 - 1) 


(«) 


W(U) 


V2-UV2-1) 


2(V2 - l)a* 


^ 2(V2 - 1)V > -foa 1 . 

1 n case 2 (y /2 - l)a > we choose 70 = 1 , and we obtain 


0 >‘ 


2a + 


Thus we get 


W{U) Z 


2a 




2a + 


K) 


■ ajS 


~ 2a — fl 


If we wnte £ = ija, we have 


(7) 


W) £ va\ 

£ T V 
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We summarize (6) and (7) in the following table: 


P/a 

^ 2(V2 - 1) 

= y < 2\/2 - 1 

W(U) 

£ 2(V2 - 1)V 

2 — 2 
~ 2 + , va 


Thus for KolmogorofE’s case (j? = 1) we have W(U) ^ 

REFERENCES 

[11 A. Koumogohoff, “Bemerkungen zu meiner Arbeit ‘fiber die Summon zufalliger 
Grossen’/’ Math. Annalen, Vol. 102 (1929), pp. 434-48S. 

[21 K. L, CirtiNQ, “On mutually favorable events,” Annals of Math. Stat , Vol. 13 (1942), 
pp. 338-349. 

[31 M. Fb^chet, Lea probabilities associees d un systbme d'ivinements compatibles et depen 
denis, Premi&re partie, Hermann, Paris, 1939, p. 59. 


APPROXIMATE WEIGHTS 

By John W. Tukey 
Princeton University 

1. Summary. The greatest fractional increase in variance when a weighted 
mean is calculated with approximate weights is, quite closely, the square of the 
largest fractional error in an individual weight. The average increase will be 
about one-half this amount. 

The use of weights accurate to two significant figures, or even to the nearest 
number of the form: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 28, 30, 
32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 75, 80, 85, 90, or 95, that is 
to say, of the form 10(1)20(2)50(5) 100 X 10 r can thus reduce efficiency by at 
most \ percent, which is negligible in almost all applications. 

2. Proof. Let the optimum weights be W<, i = 1, 2, ■ ■ • , n, with TF< > 0, 
where it is convenient to choose the normalization STL; = 1. Let o' be the 
variance of STPkci, then tho variance of each x, must be a l /W %, and since this 
is a weighted mean, the means of the x t are the same. 

Let the approximate weights be Wi{l + X0<), where 0 < X < 1 and | 0< | < 
1, i = 1,2, ■ ■ ■ , n. Thus X is the largest fractional error which may be made 
in the situation considered. We need tho weak requirement X < II The ap¬ 
proximately weighted mean is 

E T7,(l -f Xfljg, _ yp „ 1 + X9f 

E TF,(1 + Wt) ^ * 1+ X5 ’ 
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where 9 = 217,0,. Its variance is 

Y,w i / 1+ x ^ 2 ° 


1 - AS/ W, 


- ' h +,-^2 m <». - » + (—$-,£ bw , 
(E w t ri) - b" 


= + x aw 

and, since VWA < 1, this is bounded by 


aNl+X 5 l ~ 6 ' 


(l + my 

Now the only maximum of this expression for j 9 j < 1 occurs when 
and the bound becomes 


-X, 


+ x*' 


This proves the first statement in the summary. 

The greatest fractional change which occurs when a number is approximated 
by one of the form 10(1)20(2)50(5)100 X 10 r is 5/105, which occurs, for ex¬ 
ample, when 10.499999 ■ ■ • , is replaced by 10. The same CHtimale applies to 
an approximation to two significant figures. The variance is thus multiplied 
by a factor bounded by 

1 + £ 1M **■ 


which proves the second statement. 

The use of a weight of the simpler form 10, 15, 20, 30, 40, 50, 70, times a 
power of ten is seen in the same way to lead to an increase in variance and a 
decrease in efficiency of at most 4J percent. 

3. Comment. It is interesting to compare the 90 possible values for 2 sig¬ 
nificant figures, the 35 possible values for the numbers proposed above, which 
might be called two curtailed significant figures, and the 24 possible values for 
logarithmic spacing at interval (1.05) 2 , all of which extend over one power of 
ten with the same maximum fractional error in rounding. The use of the cur¬ 
tailed scheme for critical tables of weights and weighting coefficients would save 
more than 60 percent of the entries needed for two complete significant figures. 

This device applies equally well to other numbers of significant figures. 
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ON THE USE OF THE NON-CENTRAL /-DISTRIBUTION FOR COM¬ 
PARING PERCENTAGE POINTS OF NORMAL POPULATIONS 

By John E. Walsh 
Princeton University 

1. Introduction. Consider two normal populations with the same variance 
and means g and v respectively. It is well known that confidence intervals and 
significance tests can be obtained for the difference p. — v. Since p is the 50% 
point of the first population and v is the 50% point of the second population, 
this iepresents a particular solution of the general problem of obtaining confi¬ 
dence intervals and significance tests for the difference 9 a — <p 3 , where 0 a is 
the a percent point of the first population and <pp is the (3 percent point of the 
second population. The purpose of this note is to point out that the results of 
Johnson and Welch [1] for the non-central /-distribution can be used to furnish 
a solution of the general problem. 

2. Analysis. Let A y be the 7 percent point of the normal population with 
zero mean and unit vaiiance (1 e. exactly 7% of the population has values less 
than **i 7 ). Then if a is the common standard deviation, 

8 a = n + A a ff , <pf = v -p Ap<r 

Thus 

0 a - <ph = (g — v) + (A„ — Ap)v. 

The non-central /-distribution investigated by Johnson and Welch in [1] is 
based on the quantity 

«•=(*+ 5)/VxV/, 

where z has a normal distribution with zero mean and unit variance, 5 is a con¬ 
stant, and x has a x 2 -distribution with / degrees of freedom and is distributed 
independently of 2. Methods and tables are given in [1] whereby a constant 
/(/, 8, e) can be computed having the property that 

Pr[t > /(/, 8, e)] = e. 

These relations will be used to obtain confidence intervals for 0 a — <pp. The 
resulting confidence intervals can be used to obtain significance tests for 0« — ipp . 

Let Xi , ■ ■ ■ , x„ be a random sample of size n from the first population while 
yi , • • • , y m is a random sample of size m from the second population. Then 
consider 
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This quantity has a non-central ^-distribution with 


I-U.-Aj/yT + I, t-m + 


For notational simplicity let 

fm + n-2, ~~i ? , «\ = 1(e), 2 (ac< - £) 2 ~ S\, £ (l/j “ #)* » 5*. 

V Vs + iJ 

Then one-sided confidence intervals for 6 a — with confidence coefficient e 
are given by 


— vs < » - y - 


tkWSl + si 


4 /(» + »-s)/(i + i)' 

6 a - <pp > £ - 2/ - 1(1 - e)V<S'i + ,S’| 

l/ ( ” + » - 2 > / (; + ~) 

Two-sided confidence intervals for 6 a - ipp with confidence coefficient 

1 — («! + ea) 

are given by 

x — y — _ _ Si -f- *Sj 


/(» + „- 8 )/(i+i) 

< 6« — <pp < x — g ~ t(l — ci) "n/,S' i + <S'jj 

|/^ n _ 2 )/ /g + I) 


where «i + < 1. 


REFERENCE 

[1] N. L. Johnson and B. L Welch, “Applications of the non- 
Biomtrika, Vol. 31 (1940), pp. 382-389. 
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THE TEACHING OF STATISTICS 

A report of the Institute of Mathematical Statistics Committee on the 
Teaching of Statistics 1 

PREFATORY NOTE 

This report on the teaching of statistics contains two parts. Part I is a sum¬ 
mary of the conclusions reached by the committee concerning the appropriate 
content and organization of teaching in statistics. It is oriented towards the 
future, and is intended as a program for action. Part II, mainly the work of the 
chairman of the committee, is a more intensive discussion of the general problem. 
It surveys the present state of the teaching of statistics, probes some of the 
reasons for existing weaknesses in this teaching, and states more fully the basis 
for the conclusions summarized in Part I. 

Additional material, with special reference to applied statistics, is contained 
in a report of The Committee on Applied Mathematical Statistics of the National 
Research Council, entitled Personnel and framing Problems Created by the 
Recent Growth of Applied Statistics in the United States.' 

PART I 

SUMMARY OP CONCLUSIONS 

1. Who are the prospective students of statistics? A complete teaching pro¬ 
gram in statistics must be designed to meet the needs of four principal categories 
of students, listed here according to the amount of training in statistics that is 
needed to meet their requirements. 

a. All college students. Statistical method is a vital branch of scientific 
method. It is widely used in most sciences, business, government, and ordinary 
life. Some understanding of the nature of inductive inference from quantitative 
data on the basis of the theory of probability as portrayed in statistical method 
is an indispensable part of a liberal education. 

b. Future consumers of statistics. Some students will specialize in adminis¬ 
tration, business, or other subject-matter that will require them to understand 
the results of statistical analyses of special problems, although they themselves 
do not make these analyses. For example, business executives and government 
administrators must frequently base action on statistical studies. Research 
workers and teachers in many fields may not themselves use statistioal methods, 
yet in order to keep abreast of their own or cognate fields they must read and 
understand studies using statistical methods, 

c. Future users of statistical methods , A still smaller group of students of 

1 The Committee consists of Harold Hotelling, Chairman; Walter Bartky, W. Edwards 
Deming, Milton Friedman, and Paul Hoel. 

s Copies may be obtained from the National Research Council, 2101 Constitution Ave., 
Washington 25. 
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statistics are training themselves for careers of specialization in economics, pop¬ 
ulation, sociology, housing, business, business research, industrial design, indus¬ 
trial production, personnel, purchasing, public opinion, biology, agricultural 
science, metallurgy, physics, chemistry, psychology, or Home other field that 
makes extensive use of statistics. Research in these fields often requires the 
use of advanced statistical techniques, and even the development of new statisti¬ 
cal theory. Students planning to do such research need statistical theory and 
methods as a tool. 

d. Future producers and teachers of statistical methods. The smallest, but in 
many respects most crucial group of students of statistics, are those who intend 
to specialize in statistical methods for the sake of statistical methodology. 
Many of these will become teachers or full-time research workers, though some 
will find posts in government and industry in high-grade statistical work, fre¬ 
quently requiring the development of new statistical theory and methods. 
These students will become tool-makers. 


2. What should they be taught? 

a. All college students . 3 The fundamental logic and philosophy of state-lies 
can be taught at an early stage. It is perhaps an appropriate subject to include 
in the kind of survey courses of physical or social sciences that have 1 income so 
common in recent years, Three or four weeks of lectures and discussions should 
suffice to acquaint the students with the broad principles of inductive inference. 
No mathematics need be included, although some elementary experiments may 
well be performed to instil the concepts of sampling variation, randomness, and 
statistical predictability. The student even at this stage can lx! made to recog¬ 
nize the fundamentally statistical character of most decisions, arising from the 
fact that they involve an clement of uncertainty and a balancing or the impor¬ 
tance of different types of errors. The student can be mado to understand the 
fundamental difference between inductive and deductive statements, file nature 
of statistical estimation, and the nature of a statistical hypothesis. These 
concepts can be made concrete by illustrating them in terms of problems ranging 
from everyday questions such as whether to cross a street in the middle of the 
block on up to such vital problems as the construction of an appropriate social 
security plan, or the design of an efficient experiment for selecting the best variety 
of corn, or the selection of the best method of testing for the presence of a disease. 

A” e . ofjMUic*. Future consumers of statistics need two 
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this need. In addition, they need a leasonably thorough understanding of what 
statistics can and cannot do, what the major statistical techniques are, and how 
to interpret the results obtained by the application of such techniques Tins 
need may be met for those students who have some mathematical background 
by all or part of the fundamental one-ycar course discussed in the next section. 
For students lacking this background, special courses along similar lines will 
be required. 

c. Future users of statistical methods. It is essential for fruitful application 
that users of statistical methods should not mechanically apply procedures 
learned by rote or taken from a manual. Since few research problems fit per¬ 
fectly into clearly defined patterns, nothing is so important to the successful 
collection and analysis of statistical data as adaptability and flexibility in using 
techniques. These require a thorough comprehension of the logical foundations 
of statistics, especially of the assumptions underlying its various technical 
devices, and sufficient knowledge of the derivations of these devices to be able to 
adapt them to the special circumstances that inevitably develop. To provide 
this backgiound, a minimum of a full year fundamental course m statistical 
methods is essential, followed by courses of application. It is highly desirable 
that this fundamental couise be based on calculus as a prerequisite, because with¬ 
out it a proper understanding of the development of statistical techniques cannot 
be attained. But this is probably impossible at present, in view of the unfor¬ 
tunately low level of mathematical training of most college students. As an 
expedient, and it is hoped a temporary expedient, it is recommended that the 
fundamental course be given in two sections, one requiring calculus, the other 
only a knowledge of first-year college algebra, A single course (or pair of courses, 
in line with the temporary expedient just mentioned) should suffice for all depart¬ 
ments, because the core of statistical methods is common to all fields of study. 
Given in this way, the fundamental course can have the advantage of being 
taught by the most competent statisticians in the institution. 

In addition to a thorough training in theory and methods, users of statistical 
methods need training in applications. This can be provided by courses in 
various applied fields. It is usually advisable that these courses be given in the 
department of application (agriculture, population, engineering, economics, 
psychology, etc.), and require the fundamental one-year course as a prerequisite. 

d. Future research workers and teachers of statistical method. The futuie 
research workers and teachers of statistical method clearly require far more 
intensive training in theory than has so far been suggested. A fundamental 
prerequisite to such training is knowledge of some advanced mathematics. It 
is difficult to specify exactly what or how much mathematics is necessary, but 
something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist. 

In addition to advanced mathematics and advanced work in statistical method, 
the future statistical theorist needs a good deal of work on applications, in the 
form either of experience or courses. He will be a tool-maker, and needs to 
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know by personal experience something of the problems of those who Use his 
tools. One satisfactory arrangement is an internship in statistical research, 
as is currently provided by some institutions. By this arrangement, interns 
work under competent leadership in various government or private agencies 
that are engaged in large-scale statistical Btudies. The interns do research in 
theory, adapt the physical circumstances to theory and vice versa, and have 
actual practice in the design of experiments, the construction of questionnaires, 
writing of instructions, planning tabulations, analyzing the results, and exam¬ 
ining sampling variances. 

It is obvious that proper advanced courses in statistics will for many years be 
the province of a few institutions only, as there does not exist at present an ade¬ 
quate professional body to man more than a few. 


3. Who should teach statistics? It is clear from the preceding section that 
two different kinds of courses are required to meet (he needs of students of 
statistics: first, courses m statistical method and methodology; and second, 
courses in applications of statistical methods to particular fields. 

The moat important requirement for a successful university program in statis¬ 
tics is that courses in statistical method and methodology should be taught by a 
statistical theorist, a man who has had the training outlined in Art. 2d above, 
is specializing in statistics, is doing research in statistical method, and who has 
had some first-hand acquaintance with applications of statistical techniques. 
This is the only way such courses can he kept abreast of developments and 
sufficiently broad to meet the needs of all departments. This recommendation 
may seem to belabor the obvious, but a glance at the qualifications of most 
people currently teaching statistical methods will show why it is necessary. 

Most courses in applications should be taught by people thoroughly conver¬ 
sant with the relevant subject-matter fields as well as statistical methodology, 
borne courses in applications may be taught by statistical theorists, particularly 
new applications or applications that are common to many fields. 

4. How should the teaching of statistics be organized? The teaching program 
in statistics should be organized around a separate administrative unit, an Insti- 
tute or Department of Statistics. This department should be primarily respon- 
sible for the teaching of courses in statistical methods: the fundamental course 
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through an associated research staff, special assignments involving the applica¬ 
tion of statistical methods to concrete problems. 

Intermediate courses dealing primarily with applications ordinarily belong in 
other departments (agriculture, economics, demography, engineering, biology, 
etc.)> although some may be given in the department of statistics. The exact 
location of courses in application will depend on the accident of the depart¬ 
mental affiliation of the persons competent to teach them. Coordination of the 
teaching program in statistics can be achieved by an interdepartmental com¬ 
mittee. The department of statistics should not, however, consist of such a 
committee under a different name. It should be a thoroughly independent de¬ 
partment, with all or most of its members entirely in the department. 

The recommendation that the responsibility for teaching statistical methods be 
centered in a separate department is based on the belief that the teaching of 
statistical methods without theory can only be uninspiring and harmful; that a 
separate department of statistics offers the only arrangement that can assure 
statistical theory being taught by competent theorists, and the only satisfactory 
arrangement for ensuring the strong incentive for statistical research, with appro¬ 
priate recognition and advancement, which is as necessary for the teaching of 
statistics as for the teaching of any other subject. 

6. What should be done about adult education? The preceding recommenda¬ 
tions arc all directed toward the teaching of statistics to undergraduate and 
graduate students. There is an additional need that these do not meet, namely, 
the provision of training to mature research workers in various fields already 
established in their professions. This need arises in part from the inadequate 
teaching of statistics in the past, but oven more from the extremely rapid advance 
in the theory and practice of statistics which have made it difficult for any but 
the specialist to keep abreast of developments. Some institutions are making 
efforts to meet this need by providing evening and late-afternoon classes for 
employed research workers, Such classes are feasible only in the larger centres 
of statistical activity. There is also the need of providing advanced research 
workers in particular fields with highly specialized guidance in selected topics. 
A department of statistics organized along the lines suggested above can con¬ 
tribute toward meeting this need by effective counseling of colleagues in other 
departments, and by organizing special seminars and lectures for them. The 
professional statistical associations are also contributing by arranging special 
expository programs. 
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A MINOR NUISANCES AND INEFFICIENCIES IN STATISTICAL TEACHING 

6. Lack of coordination among departments. Lack of advanced courses and 
aboratory facilities. The teaching of statistics in American colleges and uni¬ 
versities, ivluch has for the most part been a development since the first world 
war and has now reached largo proportions, presents a number of unsatisfactory 
eatures Courses in statistical methods arc taught in various departments 
without coordination or inter-communication. These courses cover what is to 
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illustrative examples drawn in each case from material pertaining to the depart¬ 
ment in which the course is taught. Thus a student desiring to learn more about 
statistics than he can obtain in one department musL, m taking courses m other 
departments, repeat a gieat deal of what he has previously covered. 

There is a plethora of elementary courses and a dearth of advanced ones. 
Some departments have excellent statistical laboratories which they reserve for 
the use of their own students, each with an attendant to keep others away, while 
other departments have none. Some classes in elementary statistics are too 
largo and some too small, with no one in a position to equalize the sections be¬ 
tween different departments 

7. Inefficient decentralization of libraries. The library situation is confused. 
Books oil statistical methods are catalogued and shelved under Sociology, 
Economics, Business, Psychology, Zoology, Botany, Engineering, and Medicine. 
Books on probability are divided between Philosophy, Mathematics, Physics, 
and Chemistry. Books on the method of least squares are for the most part 
divided between Mathematics, Astronomy, and Civil Engineering, though some 
gel into the Economics, Geology, and Physics reading-rooms. Works on the 
analysis of variance and design of experiments are likely to be concentrated under 
Agriculture, while methods of approximate evaluation of multiple integrals and 
similar purely mathematical subjects of use in statistics are, at least in one of our 
largest universities, to be found only in the library of Biology 

B. THE MAJOR evil: FAILURE TO RECOGNIZE STATISTICAL METHOD AS A 
SCIENCE, REQUIRING SPECIALISTS TO TEACH IT 

8. Too many teachers not specialists. The above nuisances are but minor. 
The major evil is that those attempting to teach statistical method' are all too 
often not specialists in the subject. Their original selection was seldom on the 
basis of scholarship in this field; they are not encouraged to make advanced 
studies in it; and their environment is such as to draw their attention in every 
direction except to the central truths and problems of their science. Frequently 
they lack the knowledge of mathematics necessary to begin to read the more 
serious literature of the subject that they are teaching. Many have been utterly 
unable to keep up with the rapid progress which has been taking place in statisti¬ 
cal methods and theory, progress which affects even the most elementary things to 
he taught, 

9. Results: students ill equipped. There results a widespread teaching of 
wrong theories and inefficient methods. Students are sent to the government 
service and to industrial and commercial statistical positions equipped with the 
skill that results from careful drilling in methods that ought never to be used. 
Some of these same students are encouraged and assisted to become college and 
university teachers of statistics without ever making thorough-going studies of 
the fundamentals of the subject, or exhibiting any power of making original con¬ 
tributions to it, or studying any graduate mathematics, Through the method of 
selection of teachers in general use, and through textbooks written by individ¬ 
uals of this type, there is a perpetuation of obsolete ideas and unsound methods. 

All this does not mean that any considerable number of people teaching statis- 
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tics are unworthy or objectionable members of the academic community. Many, 
indeed, are of superior intellect, upright character, personal charm, and un¬ 
doubted teaching ability. Some are making creative contributions to other a ob¬ 
jects. The only trouble is that they are teaching a subject in which they arc not 
specialists, and which progresses so fast that only specialists can keep up with it. 

10. Reasons why teachers of statistics are often not specialists. The chief 
reasons for the extensive teaching of statistical method by people who are not 
specialists in it appear to be the following: 

a. The rapid growth of the subject and multiplication of its applications, creating 
a very large and very urgent demand for teaching it that could not be met im¬ 
mediately by the small existing number of scholars specializing in statistical 
method. This difficulty is aggravated by the paucity of university facilities 
for training advanced scholars in the field, so that even now the available number 
of such scholars cannot be expanded with sufficient rapidity to meet the current 
need, As specialists have not been available in anything like sufficient numbers, 
statistical method has inevitably been taught largely by non-specialista. 

b. Confusion between statistical method and applied statistics, Statistical 
method is a coherent, unified science. “Applied statistics’' may mean any of 
thousands of diverse things. Any particular study in applied statistics will 
ordinarily utilize some few of the results obtained by the science of statistical 
method, but will be largely concerned with matters peculiar to the particular 
application in view and others closely related to it. For example, studies of 
business cycles utilize statistical methods, good or bad, with a view to drawing 
inferences from existing data on prices, production, incomes, interest rates, bank 
reserves and the like. The main job of the applied statistician in this field is to 
study the sources and nature of the various series of observations, keeping in 
mind incidental events which may break the continuity of a series, and watching, 
with a background of economic theory and knowledge of the facts, for explana¬ 
tions He should also be well acquainted with statistical theory, since other¬ 
wise there is grave danger of wasting or misinterpreting the laboriously accumu¬ 
lated observations. Indeed, an organization studying business cycles, or solar 
cycles, or rat psychology or cancer or practically anything else, would almost 
certainly benefit from participation by a specialist in statistical method. 

However, the chief attention in any such study will not be on statistical method 
but on features peculiar to its own scope. The specialist in statistical method 
will do well to participate occasionally in such a study, but if he does so too ex¬ 
tensively the needs of the application will so engross his attention that ho cannot 
keep up with the progress of statistical method itself. 

The call of applications is enticing, and has led many young scholars to forsake 
the cultivation of statistical theory. The applications have benefited greatly 
by the process Moreover, problems brought back in this way from applica¬ 
tions have provided valuable inspiration in developing theory, The mistake 
lies in supposing that participation in applied’statistics is equivalent to specializa¬ 
tion in statistical method and theory, and the consequent appointment to teach 
the latter of persons whose sole concern is with the former. 
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c. Failure to recognize the need for continuing research m the theory of statistics 
by those who teach it. There is an easy tendency to assume that all the requisite 
ideas and formulae can be found in some book, and that the duty of the teacher 
of statistics is simply to transfer this established book-knowledge to the minds 
of the students and impart to them skill in applying it. Similar attitudes ap¬ 
plied to other subjects have in the past been a drag on progress, and havo long 
been discarded in respectable universities. They still hang on, however, even 
in the best institutions with respect to statistics. The spectacular advances of 
the last three decades in statistics should make it clear to anyone who has followed 
them that statistical method is far from static, that the best techniques of present- 
day statistics may tomorrow be replaced by something better, and that un¬ 
solved problems regarding the theory and methods of statistics are sticking out 
in every direction. A vast amount of research, mostly of a highly mathematical 
character, is needed and i3 in prospect. Anyone who does not keep in active 
touch with this research will after a short time not be a suitable teacher of statis¬ 
tics, Unfortunately, too many people like to do their statistical work as they 
say their prayers—merely substitute in a formula found in a highly respected 
book written a long time ago. 

d. The system of making appointments to teach statistics within particular depart- 
menls that are devoted primarily to other subjects. In effect, the teacher of statisti¬ 
cal method is too often selected by economists or sociologists or engineers or 
psychologists or medical men because he is to teach in one of these departments. 
Thus the task of selection devolves upon people unacquainted with the subject, 
though realizing the need for it in connection with a very specific application. 
Under such conditions there is an inevitable tendency to emphasize the immed¬ 
iately practical and specific at the expense of the fundamental work of wider 
applicability and greater long-run importance. Confusion between a science and 
its applications is most pronounced with those who know little about it, and the 
distinction between statistical method and applied statistics is likely to be com¬ 
pletely lost when a sociologist or an engineer is confronted with the problem of 
finding someone to teach statistics. If he does make the distinction at all he is 
likely to choose in favor of applied statistics. 

Strangely, the actual teaching that ensues is bound to consist largely of sta¬ 
tistical theory, because the students will ordinarily not have bad statistical theory 
elsewhere, and they must have some in order to apply it. What often happens is 
that a sociologist or an engineer who has made some study of statistics embarks 
on what he thinks will be a career of teaching the application of statistical method 
to sociological or engineering problems, only to discover that because of the 
ignorance of the students he is compelled to teach the fundamentals of statistics, 
an entirely different subject for which he lacks preparation, talent, and interest. 

An incident of this sort has been cited previously.' A prominent economist 
was asked to teach a course entitled “Price forecasting” in a leading university, 
and accepted. He found, however, that his lectures on this subject were over 

5 Harold Hotelling, “The teaching of statistics ” Annals of Math S'tat., vol. xi, 1940, 
pp 457-470. 
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the heads of the students because he was using statistical concepts unfamiliar to 
them, He therefore went back over the ground covered so as to explain these 
particular statistical concepts along with their application. But ill explaining 
them he found himself using other statistical concepts, which in turn called for 
explanation. At the end of the semester he found that he had not given the 
course m price forecasting which lie had planned, and for which the large class had 
enrolled, but instead had taught a somewhat disordered course in elementary 
statistics, a subject in which he did not feel particularly competent, and for which 
the students had not come. When he was asked t o teach price forecasting a year 
later he proposed that a prerequisite of a course in statistics be imposed, but, this 
proposal was rejected by the chairman of the department, and the course was not 
repeated 

11. Appointments under the existing system are not all bad. More by acci¬ 
dent than by design in the existing system, not all statistical appointments by 
departments of application arc bad. Some professors in these departments make, 
conscientious excursions into statistical theory, are well advised by competent, 
specialists in statistics, and bring about the appointment of men of high quality 
well acquainted with statistical method and theory of the currently best sort. 
This may work out well if the man so appointed is an able ami energetic scholar 
deeply devoted to his subject, if he is placed immediately in the highest pro¬ 
fessorial rank, and if he does not feel under obligation to devote hinwlf too ex¬ 
clusively to the special interests of the department of which he finds himself a 
member. He is then free to pursue his specialty, t o keep informed on the latest 
developments m statistical method and himself to add to the subject, while at 
the same time transmitting to students a well rounded and up-to-date selection 
of knowledge. It is m this way that some of the present loaders in statistics have 
developed It is a wrong procedure, however, to depend on accidents of this 
kind. 


The system of departmental organization and of making appointments and 
recognizing proficiency in the teaching of statistics needs to be altered. The 
usual story is typified by the appointment of a promising young scholar in. sta¬ 
tistical method to a junior'position in some department of application where he 
is expected to work on problems and to teach statistical methods with a solo eye 
to the work of the specific department. He is then under pressure to concentrate 
on a particular kind of applied statistics, for his advancement will depend, not 
on his statistical attainments at all, but on his study of the literature, tormin- 
ology, techniques and theories of the application. His usual associates will be 
on the department in which he is teaching rather than others teaching statistics, 
the loss, a though not total, is great, because the opportunity to make the most 
o the man s statistical ability is lost, and his ability as an economist, agricultural 
scientist, engineer, or something else that he is not particularly fitted for, is 


A stall less favorable circumstance, and unfortunately more common, is that in 
which the teacher of statistics is not even selected for scholarship in the theory 
of statistics. Studies in some other field, with some slight dabbling in the appli- 
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cation of statistical methods to it, plus a pleasing personality, have all too fre¬ 
quently been thought to comprise sufficient qualifications for teaching statistical 
methods and theory. 

12. Unsatisfactory texts. The uncritical character of the teaching is reflected 
in the long line of textbooks written by teachers who have not made any gen¬ 
uinely fundamental study of statistics, but pass on to students in a magisterial 
fashion what was passed on to them. Authority takes the place of derivations 
and ultimate sources It is no wonder that these textbooks, copied from each 
other, contain increasing accumulations of errors, or that long delays have inter¬ 
vened between the introduction of important new statistical methods and theories 
in the periodical literature and their appearance in the textbooks and courses 
put before students. 

The latest discoveries in the theory of statistics affect what should be taught 
in elementary courses, and no syllabus can be expected to survive more than a 
few years of research. The development of new statistical methods and ideas 
of overwhelming importance must be allowed to compete with material already 
well established as time and useful. The new material is equally true and in 
some cases even more useful than matter usually incorporated in the best of 
current courses and textbooks. 

13. Omission of probability theory from texts and teaching. One of the im¬ 
portant weaknesses in much of the current teaching of statistics is a failure to 
make proper use of the theory of probability. Without probability theory, sta¬ 
tistical methods are of only minor value, for although they may put data into 
forms from which intuitive inferences are easy, such inferences are very likely to 
be incorrect. The objective weighing of the degree of confidence to be placed in 
inductive conclusions is necessary to avoid fallacies. Indeed, the whole founda¬ 
tion of descriptive statistical methods, of inductive inference, and of the design 
of experiments, rests upon probability theory. 

The relevance of probability to much statistical work was indeed questioned a 
quarter-century ago by a group of economists impressed by the lack of independ¬ 
ence between consecutive observations, and this attitude, in conjunction with 
an exaggerated and belated remnant of nmetcenth-century empiricism, has had 
a certain influence, particularly on the statistical methods in use by economists. 
This view is now rapidly giving way to a tendency to use the powerful new sta¬ 
tistical methods discovered m the meantime. It is now perceived that efficient 
objective methods can be used over a much wider range of cases than was formerly 
supposed, because the independence assumed in their derivations refers not to 
observations but to residuals from the theoretical model used. Furthermore, 
research is under way, and has already achieved promising results, on the exten¬ 
sion of accurate methods to still more extensive classes of problems, 

C. PROPER QUALIFICATIONS OF TEACHERS OF STATISTICS 

14. Statistics compared with other subjects. The qualifications appropriate 
for teachers of statistical method and theory are not essentially different in degree 
from those for teachers of other subjects in the same institutions; proficiency in 
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statistical method aucl theory is merely to be substituted for it in other subjects. 
This substitution is, however, vital. It must not be imagined that proficiency 
in some other subject m which statistical methods are used incidentally w equiv¬ 
alent to proficiency in statistical method itself. The error of such a supposition, 
if carried over into another field, might lead to the appointment of a man as pro¬ 
fessor of chemistry on the ground that he could cook. 

The first requisite of the college or university professor of any subject is a pro¬ 
found and thorough knowledge of that subject. It is customary in the hotter 
institutions at least to restrict appointments to the rank of assistant professor to 
persons who have demonstrated scholarly qualifications by work equivalent to 
that leading to a Ph.D. degree, including an original contribution to tl)C subject 
that the individual is to teach. Promotion to the higher ranks is conditioned 
upon a number of criteria, among which published research is by far the most 
important in those institutions, 

16. Current research in statistical method is essential for teachers. Research 
is even more essential in the teacher of statistics than iji teachers of most other 
subjects, because so much remains to be worked out that is of immediate impor¬ 
tance. Some college teachers do no research. This is usually regarded as de¬ 
plorable. The evil is, however, of quite different magnitude according to the 
nature of what is taught by such teachers. In a new subject in which sharp 
differences of opinion exist or have recently existed on fundamental questions, 
in which current discoveries have an important bearing, and in which there have 
not yet been the time and consensus necessary for the preparation of an adequate 
and virtually error-free textbook, teaching without research may have calamitous 
effects, The effective teacher must, of course, have teaching ability, but no 
skill in pedagogy, no lustre of personality, can atone for teaching errors instead 
of truth. Errors are very likely to be taught by those who do no research, and 
then the more skillful the pedagogic indoctrination, the greater the harm. 
Sound educational policy calls for devotion to researcli of a large fraction of the 
time and energies of the teaching staff in a subject like statistical theory. Stu¬ 
dents also are in particular need of encouragement to do original and critical 
work in relatively new areas of this kind. They must be taught to shun the 
use of formulae and methods given merely on authority without full and con¬ 
vincing reasons, and to insist on looking closely and critically at assertions. 

JCven in the teaching of elementary statistical methods for direct practical 
use by specific occupational groups, where it might be thought that the teaching 
would most predominate over the research element, the teacher must face diffi¬ 
cult questions whose answers call for research in statistical theory. Let us 
illustrate this by one example out of the many possible. In teaching the analysis 
of variance for use in agricultural experimentation, questions arising out of the 
possible non-normality of the underlying distributions must be dealt with in 
some way The formulae, even those in the best textbooks, are accurate only 
if the distribution is normal, and neither this fact nor the non-normality of manv 
distiibutions should be concealed from the students. Obviously something more 
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needs to be said on the subject at this point. What the teacher can say depends 
on how deep he has gone into a whole series of perplexing questions, on some of 
which the views of scholars are not yet stabilized, and on which a tremendous 
amount of research is needed before the maximum practical value can be attained 
for a technique whose usefulness is already amazing. 

16. Minimum requirements in mathematics for the training of teachers 
and research men in statistical theory. Because research in the theory of statis¬ 
tics requires advanced mathematics, and is indeed largely mathematical in 
character, a mastery of a substantial amount of higher mathematics must be an 
essential part of the training of prospective teachers of statistics. To specify 
exactly what or how much mathematics is necessary would be a difficult task. 
Something of the algebra of matrices and of the theory of functions are minimum 
necessities, and a good deal of additional knowledge of algebra, geometry, and 
analysis add richness and power to the work of the statistical theorist, the in¬ 
ventor of new statistical methods On the other hand, the time of the graduate 
student in statistics is much occupied with the theory of statistics itself; and some 
of his time should also go into the study of applied statistics. If the students 
entering a graduate school for advanced work in statistics went there equipped 
with a knowledge of matrix algebra and theory of functions and some additional 
higher mathematics, as is obtainable by undergraduates at some institutions, 
they would have time for applied statistics and could do some real work on 
applications. 

There is a cruel dilemma here, resulting from the delay in learning mathematics 
imposed by the elementary curricula which have become customary in this coun¬ 
try. The weakness of the mathematical element in the prevailing cunicula 
affects both teachers and students of statistics to an extent justifying some atten¬ 
tion from those interested in the improvement of statistics. In American uni¬ 
versities elementary calculus is not often taught before the sophomore year, and 
the more advanced parts of algebra come still later, if at all. 

If calculus could be pushed down into the high schools and assumed as a pre¬ 
requisite for college courses in mathematics, statistics, economics, physics and 
several other subjects, the efficiency of instruction in all these departments could 
be increased. For example the difficulties experienced by students of economics 
with ideas of marginal cost, marginal revenue and the like correspond closely 
with the difficulties experienced by mathematicians for centuries in trying to 
define infinitesimals and derivatives, but now successfully overcome, The 
student who really knows differential calculus need not have the slightest diffi¬ 
culty with the marginal ideas of economics. Similarly in physics, the funda¬ 
mental concepts of speed, acceleration, potential theory, conductivity, thermal 
capacity and radiation, are all mathematical and easier to grasp once and for 
all as such than to be learned afresh with each new application from textbooks 
in physics sometimes not clearly written and taught by teachers who must for 
one reason or another avoid a mathematical approach. 

The possibilities of teaching quite advanced mathematics to young children 
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have scarcely begun to be explored. Children of kindoigart on age are fa-rinaled 
and thrilled by the wonders of topology. Croups and number theory can be 
tremendous sensations in the fifth grade, though all these subjects are ordinarily 
reserved for giaduate students specializing in mathematics. What h lacking 
is teachers who know mathematics and its applications and who possess enough 
freedom to teach what they know instead of the long, dull and relatively useless 
drill on problems of wallpaper hanging and the like, problems turning on mere 
conventions which aie quickly forgotten, painful repetitious work which makes 
children resolve to quit mathematics as soon ms possible. 

D. NEED FOR RELATING THEORY WITH APPLIED sTVTIhTICS 

17. An example of the interaction between theory and practice. A professor 
of psychology working with mental tests might enlist the assistance of a young 
statistical theorist with mutual benefit. The young man might for a short time 
do some of the drudgery of scoring tests and computing, passing on soon to the 
problems of test construction and the distribution of various functions of er )r . 
relation coefficients. This last is on a new and exciting frontier of statistical 
theory The advancement of this frontier, which is really the main business of 
the young man in his capacity as prospective statistical theorist, would in this 
way come to him naturally as a problem or scries of problems having a tangible 
meaning additional to its mathematical content. The empirical context is in 
such cases often of great value in suggesting suitable approaches, for example, 
suitable approximations in the study of functions not susceptible, to simple 
mathematical representation in terms of elementary functions. 

If the young theorist succeeds in extending the boundaries of multivariate 
statistical analysis by discovering the distribution of some new function of cor¬ 
relation coefficients, the chances are that this discovery will also have applica¬ 
tions in anthropology, medicine, banking, and other pursuits which in the aggre¬ 
gate will greatly outweigh the application originally in view. 

The discovery should be regarded primarily as a contribution to the, general 
theory of statistics, and published in a journal devoted to mathematical statist ies. 
It will then become available to a wide circle of teachers of statistics, who rna v 
incorporate it into their courses, and its methods and results will be studied by 
other investigators from the standpoint of possible generalizations and analogs. 
The importance of the discovery would be much more limited if it were thought 
of as a development in psychology and published only in a psychological journal. 
Perhaps dual or multiple publication ought to be permitted in such cases, hut the 
first publication should be in a journal of mathematical statistics, Far too many 
good statistical ideas have been buried in connexion with obscure special applieu- 

1.8. Supplying opportunities for applications in graduate studies of statistics. 

The statistician who does any work in applications must know statistics as an 
art as well as a science The theoretical statistician, if he wishes to be of tho 
utmost use to his colleagues in other disciplines, needs to know by personal 
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experience something of their lives and collateral problems. Indeed, experience 
with applications, and the challenge of problems arising out of applications, have 
played a most important part in the development of statistical theory. It 
follows that the graduate student in statistics needs contact with applied statis¬ 
tics which the institution should undertake to provide, or at least facilitate 
This need is next in importance after the needs for theoretical statistics and for 
pure mathematics. The distribution of time among the three—theoretical 
slatistics, mathematics, and applied statistics—is hard to specify exactly, and 
must, in any case depend on the nature of the student’s previous work. If his 
mathematical preparation has been full and rich, moie time should be spent on 
applied statistics in his graduate years than if he has already had substantial 
contact with applied statistics in some other way hut is deficient in higher 
mathomatiCvS. 

Applied statistics entails a somewhat detailed acquaintance with the field of 
application. Such a field might be life insurance, or mental testing, or industrial 
quality control, or sampling in the work of the Bureau of the Census or some 
other government agency; it might be agricultural economics, or business cycles. 
Proficiently in any such field calls for rather prolonged study, and it would be too 
much to expect the embryo statistical theorist to reach this stage of advance¬ 
ment in all subjects. He should, however, make more than a superficial study 
of some chosen field of application. This study might or might not be at the 
university. The requisite familiarity with applied statistics might in some cases 
be acquired by work in a government bureau, or in a research organization study¬ 
ing business cycles or something else involving applied statistics. What is most 
desirable is that the work should have brought the student to the point both of 
applying statistical methods in a reasonably effective way, and of perceiving the 
limitations of existing statistical methods. Perception of existing limitations 
has frequently been the germ of progress in the subject. 

One satisfactory arrangement is an internship in statistical research, as is 
currently provided by some institutions. By this arrangement, interns woik 
under competent leadership in various government or private agencies that arc 
engaged in large-scale statistical studies. The interns do research in theory, 
adapt the physical circumstances to theory and vice versa, and have actual 
practice in the design of experiments, construction of questionnaires, writing 
of instructions and tabulation plans, analysis of the results and appraisal of 
sampling variances. 

JO. recommendations on the organization or statistical teaching 
and research in institutions of higher learning 

19. Research should be encouraged; teaching schedules should not be over¬ 
loaded. Colleges and universities usually expect the members of their faculties 
to engage in research as well as in teaching, the relative emphasis on these two 
functions varying greatly from institution to institution and to a lesser extent 
among departments within the same institution. Reasons why teachers of 
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statistics must do current research in order to teach the subject have already 
been given in Art, 15. In the organization of statistical teaching it is thus of 
extraordinary importance that colleges and universities emphasize research in 
the theory of statistics as a leading part of the work of the teaching stall in this 
field. Hours of teaching and other duties must be kept within such bounds as 
to make research possible, the initial selection of teachers must be of persons 
capable of research m statistics, and there must be provision of needed secretarial, 
computational and other assistance. The library must be adequate, not only in 
publications containing statistical theory, but in the larger field of pure mathemat¬ 
ics as well. 

20. Organizing statistical service in the university. In addition to the cus¬ 
tomary duties of teaching and research, faculty members expert in statistical' 
methods find that they cannot escape a third, viz , advice to their colleagues and 
others regarding the statistical aspects of their problems. This often takes a 
good deal of time. Clearly it is in the interest of the academic enterprise that 
such services be provided. Scholars in many departments are finding that their 
work is greatly improved by competent statistical advice not only in the inter¬ 
pretation of tlieir data but also in the design of their experiments and other 
investigations. The provision of competent advice frequently requires extended 
consideration of the general content of the problem as well as special analysis of 
its statistical features. And initial advice often needs to be supplemented by 
further service. The statistician, like the physician, often finds that one inter¬ 
view at which a prescription is dispensed does not end the matter satisfactorily. 

Teaching hours must be distinctly limited if statisticians are to be able to- 
render this service to the rest of the institution as well as maintain a high level 
of research in their own field. 

One way to handle the problem of statistical service, especially in a large 
institution, is through a special organization devoted to this purpose. Such an 
organization, whether called a Statistical Institute, a Department of Applied 
Statistics, Statistical Laboratory, or something else, might supply not only 

advice but a more active kind of assistance, including computational and chart- 
drawing services. 


A statistical service organization should be removed from the teaching of statis¬ 
tics only to the extent necessary to gain the advantages of some degree of special¬ 
ization and to prevent undue interruption of the teacher’s other work of teaching 
and of research m theory. There aie distinct advantages for all parties in a 
fairly close connexion between practical statistical work, research in statistical 
theory, and statistical teaching. Each of these activities benefits the Others, 
provided only that it does not take away from it too much time. Research in 
s a istical theory, like medical research, needs frequent revitalizing injections of 

The teaEa C nf Tt ^i' the stimulus of contact students. 

H Ed? h ° f statical method ]S made more vigorous both by research in 

fronted the n f reS f ence .? f a PP lica « ons with which students can be con¬ 
fronted. And the needs of applications are better met if through an organiza- 
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lion such as is here envisaged they can be brought to the attention of appropriate 
specialists, and if also students can be enlisted when needed for their treatment. 

A university organization dealing with statistics may properly comprise two 
parts with overlapping personnel, one devoted chiefly to applied statistics, the 
other to theoretical statistics. The teaching might be done by both, but at 
least at the more advanced levels would be primarily the concern of the theoreti¬ 
cal part. Migration between the two ought to bo easy and frequent, though some 
individuals are so definitely adapted to one kind of work or the other as to make 
it undesirable to have fixed lules calling for periodic transfers. 

In smaller institutions it may not be practicable to have statistical organiza¬ 
tions sufficiently well staffed to provide adequate consulting service. To meet 
the needs in some of these cases regional centres for advice and service in applied 
statistics might be established at large universities throughout the country, 
with access made readily available for sister institutions. These centres might 
also carry on work in applied statistics in behalf of government agencies and other 
■organizations, much as various agricultural colleges have for years been carrying 
on cooperative work with the federal Department of Agriculture. 

The question how far, if at all, such a university centre of applied statistics 
should go into the market place and engage commercially in service to business 
concerns is a debatable one. While there may be favorable reactions upon 
scientific work, there are grave dangers to the intellectual integrity of the in¬ 
stitution which need serious consideration. 

21. Organization for teaching. Passing from questions of personnel and the 
research and service functions of academic statisticians to teaching itself, we have 
to consider problems of departmental organization, of course contents, of systems 
of prerequisites, and of methods of teaching. All these we consider secondary 
problems, not in the sense of being unimportant, but because we believe that 
proper solutions of them will be reached with reasonable promptness when 
personnel of the kind described in Sec. C of this report are at work in some such 
general setting as has just been described. The ideas recorded below are general 
in character and are to be regarded as a starting-point for developing a program 
in a particular institution, once suitable faculty members have been obtained. 

The teaching of statistics may be organized in any of the following ways: 

a. In a department of theory and a department of applied statistics, both 
forming an Institute of Statistics. 

1), In a single Department of Statistics. 

c. Under an inter-departmental committee. 

d. Under the exclusive jurisdiction of the Department of Mathematics, 

c. It may be scattered among heterogeneous departments of application, 

without formal coordination. 

Only a few large institutions will be in position to adopt the first plan. It is 
likely that the second will be most suitable for the majority. The third should 
probably be regarded as a makeshift for the transitional period until a proper 
department of statistics can be organized, a step that will not at the moment be 
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reasonably possible for most institutions because the right kind of scholarly 
personnel does not exist in adequate numbers. It is of course possible that some 
vestige of an inter departmental committee, perhaps in the form of an Advisory 
Board, might be a useful adjunct of a depaitment of statistics in order to keep it, 
informed of the needs of applications. It is also possible that something of the 
sort might function with respect to a department of mathematics, or any other 
department On the other hand, the desired consultations and adjustments 
might be accomplished in less formal ways. 

To make statistics a subdivision of a mathematics department is a solution 
that will appeal to administrators desirous of keeping down the number of de¬ 
partments The subject-matter of statistics is to a sufficient extent mathemati¬ 


cal to give some apparent weight to this plan, and some mathematicians have the 
unsound idea that any mathematician can teach statistics without specialized 
study or experience m application On the other hand, statistics has some 
features uncongenial to traditional mathematics, arising partly from the urgency 
of practical needs which go beyond what can immediately be provided by rigor¬ 
ous mathematical theory. Again we may cite the problem in the teaching of the 
analysis of variance of what to do about possible non-normality of the underlying 
distribution (Art. 15). The user of this technique has the responsibility of 
verifying that the situation conforms to the assumptions, including that of nor¬ 
mality, underlying the tabulated probability criteria. But he is in a very poor 
position to do this in a large proportion of the applications actually made of the 
analysis of variance Yet the analysis of variance in some form— possibly 
through the use of rank-order numbers or through a transformation or some other 
auxiliary device—remains the one powerful means of attacking a very large and 
important class of practical situations. The practicing statistician needs to do 
some highly educated guessing on such matters—guessing that will bo assisted 
but not made determinate by knowledge of a considerable range of mathematical 
truths regarding approaches to the noimal distribution, moments of the variance- 
ratio in samples from non-normal populations, asymptotic large-sample theory, 
and other such topics. His mathematical insight needs to be supplemented by 
consideration of the particular subject-matter of application. Moreover, it is 
desirable that students of statistics have some practice with actual empirical 
data designed to develop the art of guessing in such ways. 

Another example of non-rigorous mathematics used extensively in statistics is 

ll!l n 'f SS t a f mptotic standard errors found by the differential 
°c. It is desirable that good mathematics replace bad in such connexions, 

Sf-rSTf 18 1 Said u 0r the position int0 which so practical statis- 
icians have been driven, that even bad mathematics may be better than none 

tL r J r qin e , g00dmathematicsalor « tbo.selines can come only through 
hteresteUum “T " ™ studies of statistics, though a sufficiently 

“St r really be led by sucb * student of 
- 1 citake and complete the necessary research. Practical needs 

make appmxunauon* necessary; the goodness of a particular approximation can 
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often be judged adequately by a statistician familiar with the particular applica¬ 
tion long before the heavy artillery of advanced mathematical analysis can be 
brought up, 

Ihe teacher of statistics must have a genuine sympathy and understanding for 
applications, and these are not possessed by many pure mathematicians, at least 
in the opinion of some of those concerned with the applications; and it is this 
opinion rather than the possible fact that is of interest at the moment. For so 
long as such an opinion is maintained, for example by psychologists and econ¬ 
omists, these specialists will be suspicious that courses in statistics given by a 
department consisting largely of pure mathematicians is unsuitable for their 
purposes. The result is likely to be a sabotaging of attempts at centralization, 
the different departments reverting to the old and ultimately objectionable 
system of teaching their own separate courses in statistical methods. 1. 

These difficulties are not necessarily insuperable, and it is to be expected that 
many medium-sized and small institutions will make their mathematical depart¬ 
ments responsible for statistical teaching. But this ought not be be done without 
a consideration of the possible dangers. 

22. The statistical curriculum. We next consider curricular problems These- 
maybe divided into those of the graduate school and those of the undergraduate 
college. Those of the graduate school may in turn he divided into those of 
specialization in statistics and of auxiliary teaching of statistics to students in 
other departments, such as sociology, who need to use statistical methods, have 
not studied them sufficiently as undergraduates, and cannot afford to put much 
time on them. Of these two subdivisions the number of students at present is 
greater in the second and the ultimate importance is greater in the first, because 
the whole future of statistics depends on improvement and enlargement of this 
graduate teaching. 

The incidental teaching of elementary statistical methods to graduate students 
in other subjects, without any prerequisite in mathematics or statistics, cannot 
equip these students with a command of the subject at all comparable to that 
which could be obtained by a better integration of undergraduate with graduate 
work. A prospective sociologist, economist, psychologist, or physicist ought to 
study elementary statistical methods and concepts while still an undergraduate, 
and without special reference to his ultimate field of specialization. 

The features of statistical methods peculiar in their applications, beyond what 
is taught through illustrations and exercises in an elementary course, may bo 
(it material for a course, graduate or undergraduate, in a department of the 
application. Such a course should require as a prerequisite an elementary course 
in a department of statistics, or at least one taught by specialists in statistical 
method and theory. 

For the undergraduate college, in place of the sporadic offerings now current in 
different depart ments, we recommend a combination of two general fundamental 
courses with a number of advanced courses. Of the latter some will be special¬ 
ized to the work of particular departments or groups of departments. 



114 


the teaching of statistics 


Of the two fundamental courses one will require calculus as a prerequisite, the 
other only a knowledge of first-year algebra. It is to bo hoped that the less 
mathematical of those two general statistical courses, instead of being elected by 
a majority of students, will gradually approach extinction, while the course based 
on calculus will become the vital point of contact of the student body with the 
concepts of statistics. The chief reason for insisting upon the importance of 
calculus as a prerequisite is simply the possibility of covering important statistical 
theory that is inaccessible to those who do not have it. 

Modern statistical methods are based on the theory of probability. The 
general courses in statistics may therefore well begin with elementary probability. 
The duality between probability and statistical concepts/ for example between 
probability and relative frequency, between mathematical expectation and a 
sample mean, between parameter and statistic, should be explained. Deriva¬ 
tions and the place of the normal distribution should be sketched, and the Student 
distribution should be derived and applied to a variety of problems in the first 
course based on calculus. Later courses given by the department of statistics, 
or whoever specializes in statistical theory, will naturally cover other statistical 
methods and theories. At the same time useful courses can be offered in eco¬ 
nomic statistics, mental testing, and other fields using statistical methods by 
specialists, regardless of departmental affiliation. There might be departmental 
cooperation; for example, the department of statistics might offer elementary anti 
advanced courses in correlation and multivariate analysis, and the department of 
psychology might require these as prerequisites for some of its work in mental 
testing. ' 

The’teaching of statistics should be accompanied by considerable work in 
applied statistical problems, as well as exercises in mathematical theory, on the 
part of the students. A large part of this work in appliod statistics is best cou- 
ducted’in a laboratory equipped with calculating machines, mathematical tables, 
drafting instruments, and other appurtenances. 

Statistical laboratories require supervision, administration and maintenance. 
They are needed not only for the purpose of teaching statistics, pure and applied, 
at all levels, but also by research workers in many fields. There arc possible 
gains of efficiency and economy in a centralized administration of them. One 
suggestion is that they be under the supervision of the university library. 
Another is that responsibility for them be lodged in a central department of 
statistics, or in a two-department statistical institute. Centralization can bo 
carried too far, and it is likely that some units in a large organization will find 
it advantageous to have machines which are exclusively their own. The con¬ 
flicting claims regarding machines and laboratories will require careful weighing. 

23. Statistical method as a part of a liberal education. A question may also 
be raised as to whether some work in the statistical method should not be re- 
quired of all colleg e students as a part of a liberal education. This would be 

7 Cf. the article ‘'Frequency distribution,” Encycl. of'the Social Sciences (1931), 
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a novel step, but has much to be said for it in view of the widespread use of 
statistics and growing interest in statistics! Another point is that the student 
who can’t make up his mind as to his ultimate field of specialization or vocation 
will do well to study those things that can be used in many fields. Of such things, 
mathematics and statistics are leading examples. There are more or leBS sound; 
objections to systems of required studies, but if we are to have them, the claim, 
of statistics should not be rejected merely on grounds of novelty. 
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1. The Performance Characteristic of Certain Methods for Obtaining Confidence 
Intervals. B. M Bennett and J. Neyman, University of California, 
Berkeley, 

Certain methods for obtaining confidence limits have beon introduced by Bliss, It. A. 
Fisher and Paulson. Thus, e.g,, let aq , yt (t = 1* •, a) represent a sample from a bivari¬ 

ate normal population with means E(xi) => E{yi) = a( and variances and covariance 
a\, rr\ , <r a „, If S, y, Si, S), S lu are the sample means, variances and covariance respec¬ 
tively, than in order to determine confidence limits for a, the ratio: 

Vn(fl - at) 

VSl-2aS zt +^Sl 

may be referred to the appropriate value t , of the Student-t distribution. The inequality: 

| u | < t, may, in general, be solved as a quadratic equation in a to yield two values a, a 
which are presumed to be confidence limits for a. In this paper tho probability r of being 
correct in using such a procedure, i,e,, the performance or operating characteristic, is com¬ 
puted in the limiting case when a tt a \ ,»*„ = are assumed to be known. It la shown that 

iris a function ir(a, (,a t , a„,p) of all the parameters, and in particular of a itself , the quantity 
for which confidence limits are supposed to be provided. Similar “quadratic” methods 
are also used in certain regression problems, e.g., in determining confidence limits for a 
value of £ corresponding to an additional value of y when a previous sample regression of y 
on x is available; or in determining confidence limits for the intersection point of two popu¬ 
lation regression lines. The performance characteristic of each of these methods is shown 
to be a funotion of the quantity for which the method gives confidence limits. 

2. Some Further Results on the Bernoulli Process. T. E. Harris, Douglas 
Aircraft Co. 

Letzi ,zj ,z,, ••• , be a sequence of random variables defined as follows: P(z i«*> r) » p ,, 
r = 0,1, 2, • • , k. If z„ = 0, z n+ i = 0. If z„ = t, r ^ 0, then z„+t is distributed as the 
sum of r independent random variables, each having the same distribution as zi. It is 
assumed that * < l, where i = B(z,). Let N be the smallest value of n Buch that z„ + i m 0. 
A method is given for obtaining an expansion of the moment-generating funotion of N. 
In the cose where p, => 0 for r > 3, this expansion takes the form 1 + (1 — e“‘) (1 — pc) 
F(s), whore F(s) = /,(») - p,(l - p 0 )/,(*) = 2ip‘(l - p,)•/,(&) - • ■ • , where / v («) « 
(e‘* - s) _l , and /„(») = - x")~' . Certain restrictions on the constants p, 

insure that this expansion converges for a complex neighborhood of 8 — 0. 

3, Most Powerful Tests of Composite Hypotheses I. Normal Distributions. 

E. L. Lehmann and C. M, Stein, University of California, Berkeley, 
California. 

Critical regions are determined for testing a composite hypothesis, whioh are most power¬ 
ful against a particular alternative among all critical regions whose probabilities under the 
hypothesis tested are bounded above by the level of significance, Those problems have 
been considered by Neyman, Pearson and others, subject to the condition that the ori,tical 
region be similar. In testing the hypothesis specifying the value of the variance of a normal 
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distribution with unknown moan against an alternative with larger variance, and in some 
other problems, the best similar region is also most powerful in the sense of this paper. 
However, in the analogous problem when the variance under the alternative hypothesis 
is Jobs than that under the hypothesis tested, m the case of Student’s hypothesis when the 
level of significance is less than i, and in some other cases, the beat similar region is not 
most powerful m the sense of this paper, There exist most powerful tests which are quite 
good against certain alternatives in some cases where no proper similar rBgion exists. 
These results indicate that in some practical cases the standard teBt is not beet if the class 
of alternatives is sufficiently restricted 

4. On the Selection of Forecasting Formulas. Paul G. Hobl, University of 
California, Los Angeles, California. 

Given two competing formulas, u = g(s ,, • • • , a m ) and v = h(zi , • > , z m ), for forecast¬ 
ing a variable x, a significance test possessing optimum properties is designed for deciding 
whether one formula yields significantly better forecasts than the other The test, which 
turns out to be a Student t test, is constructed as a test of the hypothesis Ho • mi = u, against 
the alternative Hi : mi = v ,, (t = 1, • , n), in which it is assumed that the variables 

£i, • • ■ , *„ , corresponding to the a samples, are independently normally distributed with 
means mt and variances <r, = .r s . 

5. On the Power Function of the “Best” Z-test Solution of the Behrens-Fisher 

Problem. J. E. Walsh, Douglas Aircraft Company 

The most powerful Z-tost solution of the Behrens-Fisher problem (one-sided and synr 
metrical) was obtained by Schcffd in Annals of Mathematical Statistics, Yol. 14 (1943), pp- 
35-44. This note derives (approximately) the power efficiency of this Z-test for the case 
in which the ratio of the variances of the normal populations is also known. Let the Z-test 
be based on m sample valuos from the first normal population and n sample values from the 
second normal population, where m < n. For fixed values of m and n, a symmetrical 
Z-test with significance level 2 a has the same power efficiency as a ono-sided Z-test with 
significance level a . For one-sided Z-tests with significance level a, the power efficiency 
is approximately 6o[iJ + V® 1 - 8(m + n)A.]/(wi + n), where B =■ 2+ (m + n)A + 2£«/2, 
A *= 1 — K a /2{m — 1), and Ka is the standardized normal deviate exceeded with probability 
a. This approximation is reasonably accurate for m £ 4 if a = .06, m > 5 if a ~ .025, 
m £ 6 if a = ,01, m > 7 if a = ,006 Intuitively the power efficiency of a test measures 
the percentage of available information per observation which is utilized by that test 

C. On Sequences of Experiments. Charles Stein, University of California, 
Berkeley, California. 

One performs a sequence Of N experiments to decide between two simple hypotheses 
regarding probability distributions of certain observable quantities. At each stage there 
is a choice among L experiments and the one chosen yields a random variable. One wishes 
to aohiove certain upper bounds a and p to the probabilities of first and Beoond kind errors 
respectively, and, subject to Uioho restrictions, to minimize the expected cost under a third 
hypothesis. The cost of each particular sequence of experiments is known. A solution 
is obtained, ossenfially by applying Lagrange’s method and working back from the end 
of the experiment. This can be generalized to multiple decision problems. The results 
are applied to two-sample tests with the second Bample of variable size, and to Wald’s 
sequential analysis. As another problem, suppose (Xi , Y i), (Xt, Y s) • ■ • are independ¬ 
ently distributed with bivariate normal distributions having mean f and covariance matrix 
2, both unknown. One tests th : f = 0 against Hi : = i A test (not necessarily 

optimum) valid within the usual approximation is obtained from the ratio of the p.d f. 
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of Hotelling's T 1 under Hi to that under He. Analogous results hold for the multiple 
correlation coefficient, ratio of two variances and test for linear hypothesis. 

7. The Effect of Selection Above Definite Lower Limits of Linear Functions of 
Normally Distributed Correlated Variables on the Means and Variances of 
Other Linear Functions. G, A. Baker, University of California, Davis, 
California. 

Sometimes certain variables in a system can be observed before other economically or 
socially important variables. These variables or linear combinations of them can bo used 
aB a basis of selection at given levels, The question is*. How does selection on these earlier 
or more easily available variables affect the moan and varlanoe of the economically or so¬ 
cially more important variables or, perhaps, linear functions of the more important vari¬ 
ables. The general procedure is clear. We transform to a new system of variables which 
contains the linear functions on which selection is performed and the linear functions of 
which the means and variances are required as separate variables. Tho remaining new 
variables are eliminated, by integration. The final calculation involves the numerical 
evaluation of integrals whose integrands are the product of polynomials and normal multi¬ 
variate functions and whose limits depend on the given levels of selections. Tho general 
ideas are simple but the actual labor of computation in a given case i s tedious. An exempt e 
is considered in detail 


8. An Inversion Formula for the Distribution of a Ratio of Random Variables. 

J. Gurland, University of California, Berkeley, Calif. 

The repeated Cauchy principal value of integrals applied to characteristic functions is 
used in obtaining inversion formulae for distribution functions. Let the random variables 
Xi and X, have a joint distribution function with oorresponding characteristic function 

<K<i, <»). Suppose P(Xi < 0 | = 0. Let j g(l) dl - lira + j^g(t) dl for any 


function j({). If G(i) is the distribution function of Xi/X, then (7(g) 4 . 0(x - 0) 

1 “ J \ dL TW « formula is free of restrictions which accompany the formula 

given by Cramer in the case where Xx and X, are independent; and differentiation extends 
a resuit of Geary to a much larger class of distribution functions. Further generalizations 
of the theory are obtained, and as an example the distribution function of the ratio of quad¬ 
ratic forms of random variables Xi ,X j ■••X, is considered in the oase where X, ,X, ■■■ X„ 
have a multivariate normal distribution. 


9. Independence of Parameters and Sufficient Statistics. 
Utuversity of California, Berkeley, California. 


E. W. Barankix, 


„ Tb , a ”°, tl , OIia . Of r 0 T P l ei{ iSt °f inde V«ndent parameters and minimal act of sufficient statistics 
ar, -u,.ably ur-fir.cj ror a class of families of probability densities jp(»,, ... , a, ; 

\ . ' 1 ' 9 ’ ,le of each of theae sets ia determined as the rank of a car turn 
Sec0Ild 0,der watinuous differentiability is eventually required of the function p < 
thelZlu 1 °? 1 J tl °“? re laid down ' de8 '6 n ed to ensure that the behavior of pin 

assumed Th?!!rnhl° B , b * h ‘ mor \ n the 8maU wl »en only continuous differentiability is 
assumed. The problem of determining the order of a minimal set of sufficient statistic 

is made, by certain device, to become identical in character with that of finding the order 

... ...pi*...(T o. I. i. 2, " 
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An explicit method is given for finding a complete Bet of independent parameters and a 
minimal set of sufficient statistics. 


(Presented December 30, 1947 at New York at the Annual Meeting of the Institute) 

1 . Distribution of the Circular Serial Correlation Coefficient for Residuals from 
a Fitted Fourier Series ( Preliminary Report). R L. Anderson, University 
of North Carolina, Raleigh, North Carolina and T. W. Anderson, Columbia 
University. 

Given a set of N observations [Ai], which are defined as follows: 

X{ — m = p ■ (Xt — L — pi — L) + a , 

where the residuals |t,l are assumed to be normally and independently distributed with 
zero meanB and equal variances and L is the lag. A statistic for testing the null hypoth¬ 
esis: p <=• 0 is tfi, the oiroular serial correlation coefficient of residuals e< from a regression 
line fitted by least squares: X, = M< +e<. The foil owing regression line iB considered: 

M. »«. + £'a* Cos^+£' bt Sm , 

where k ranges over some subset of the integers 1, 2, ••• , 4(A 7 — 1) or RiV), depending 
on whether N is odd or even (if N is even, b|y is not UBed). Hence L R is defined ns’ 

„ ci c« i + ct+j + • • • + e w-at 
tRm 2 ^ ' 


with Ci+y °= c, . 

The distribution of thiB L R has tile same general form aB that presented by E. L. Anderson 
for p = 0 [“Distribution of the serial correlation coefficient,” Annals of Math. Statistics 
13:1-13(1942)]; and for p ^ 0 by W. G. Madow [“Note on the distribution of the serial 
correlation coefficient,” Annals oj Math. Statistics 16 308-310(1945)]. 

N 

For Mi consisting of terms of only one period, — = 2, 3, 4, 6 ,12 and 24, exact values 

of the 1% and 6 % significance levels of i R have been computed for N — 12 and 24. Ap¬ 
proximate significance levels have been computed for N = 12(12)96, More of the exact 
significance levels are being computed, and all computations will be extended to include 
some multiple periods and some lags greater than 1 . 

2. Some New Methods for Distributions of Quadratic Forms. Harold 
Hotelling, Institute of Statistics, University of North Carolina, Chapel Hill, 

Any homogeneous quadratic form in normally distributed variates of zero means has 
the same distribution as q = l fax] + • ■ • + a n x\), where the at are roots of a determinantal 
aquation based on the coefficients of the given form and the parameters of the normal 
distribution, and where tho x/ are normally and independently distributed with zero means 
and unit variances. Wo take 2 o< = n, and begin by expanding the distribution of a positive 
definite form in a series of powers of g whose coefficients are polynomials in the reciprocals 
of the at. This series shows the analytioity of the function, which is then expressed as 
the produot of a distribution funotion of a series of Laguerre polynomials with coefficients 
which are simple polynomials in the moments of the at . Indefinite forms and certain ratios 
of forms are dealt with by convolutions of these series and by other means. 
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3, Frequency Functions Defined by the Pearson Difference Equation. Leo 

Katz, Michigan State College, East Lansing, Michigan. 

Frequency “links” formed from the Pearson difference equation provide an efficient 
means of fitting functions to observed distributions. These links, involving throe cons twite 
which are determined by the first four moments of the observed series, correspond to a 
three-parameter family of discrete frequency functions. This family of functions is just 
as broad as that defined by the differential equation, containing functions of equally diverse 
types; in addition, it has the very important-advantage that the graduation process is the 
same for any type. Further, the simpler functions of the family all correspond to points 
lying in one plane of the parameter space. This plane, giving a two-parameter family 
of functions (depending upon the first three moments), is studied intensively, rather com¬ 
plete results being obtainable for areas, moments, sampling characteristics of moments, 
eto. It is also shown that the problem of discrimination among simple discrete frequency 
functions for graduating observed data is resolvable (in the plane) to the sampling distri¬ 
bution of one statistic. A special case of the two-parameter family depending on only the 
first two moments was previously discussed 

4 Distribution of the Sum of Roots of a Determinantal Equation under a 
Certain Condition. D. N. Nanda, Institute of Statistics, University of North 

Carolina, Chapel Hill 

Let i = || Xu || and i* = || z* || be two p-vanatc sample matrices withn, and m degrees 
of freedom Then S = xx'/m and S* = x*x*’/n t are, under tho null hypothesis, independ¬ 
ent estimates of the same population covariance matrix. The distribution of a root, speci¬ 
fied by its rank order, of the determinantal equation | A - tJ(A B) | ** 0, where A » mfi 
and B - n 2 S* } has already been given by S. N. Roy, and by tho author, who has also ob¬ 
tained tho limiting distribution of any root when one of the samples becomes infinitely 
large The moment generating function of the sum of tho roots when m *= p ± 1 can be 
derived from the limiting distribution of the largest root. Tho probability distributions 
of the sum of roots under this condition have been formulated for tho determinantal equa¬ 
tions having two, three, and four roots. The moments of these distributions have also 
been obtained The method is applicable for the determinantal equation of any order. 
These probability distributions can easily be tabulated, as they involve only simple al¬ 
gebraic and incomplete beta functions. 

5. Applications of Carnap's Probability Theory to Statistical Inference. 

Gerhard' Tintner, Iowa State College, Ames, Iowa. 

The new theory of probability of Rudolf Carnap (“On inductive logic,” Philosophy of 
Science, vol 12, 1945, pp 72 ff “The two concepts of probability,” Philosophy and Phe¬ 
nomenological Research, vol. 5,1944, pp 513 ff.) introduces a distinction between probabil- 
ityi, the degree of confirmation, and probability?, related to relative frequency. Xt is 
believed, that the ideas developed are useful in clarifying the problems of statistical in¬ 
ference. 

As an example, consider the case of “inverseinference," i,e, inference from a sample to 
the population. The evidence is that in a sample of size s there are si individuals with 
a certain property M and s, = 8 - Si without the property, The hypothesis is that in the 
population consisting of n individuals there are m individuals with property M and n, w 
,n - m individuals without this property. The degree of confirmation is then; 
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In. this formula wo have: w, the logical width of the property M, w, the logical width of 
the property nonAf, h « «?i + u > t . It should bo noted that for tuj = w t => 1 the formula 
becomes the classical result, i,e. a term of the hypergeometric distribution. 

This idea may be applied to statistical estimation. We could for instance choose ru 
in suoh a fashion that c* becomes a maximum. ThiB would be estimation by the principle 
of maximum degree of confirmation, analogous to maximum likelihood. Inasimrlar fashion, 
we may also use c* to establish limits for «i similar to confidence or fiducial intervals. 

0. Circular Probable Error of an Elliptical Gaussian Distribution. Hallett H, 
Germond, S. W. Marshall & Co., Consulting Engineers, Washington, D. C. 

Preliminary tables are presented, giving the radii of distribution-centered circular 
cylinders enclosing various percentages of the volume under an elliptical bivariate Gaussian 
surface. These tables are further interpreted in terms of a correlated bivariate Gaussian 
distribution. The application of these tables to impact analysis is illustrated. 


(Presented December 29,1947 at tho Chicago Meeting of the Institute) 

1. The Asymptotic Analogue of the Theorem of Cramer and Rao, Herman 
Rubin, Institute for Advanced Study, Princeton, N. J. 

The author generalizes the results of Cramdr and Rao on the minimum variance of es¬ 
timates to the case of the asymptotic distribution of an estimate. I-Ie Bhows that if certain 
regularity conditions are satisfied, the formula given by Cram6r and Rao remains valid. 
The main results are obtained in the case of oonBiBtent estimates, but with a stronger Bet 
of hypotheses, the results remain true for estimates whioh are not consistent. The method 
used to obtain these results is to construct statistics to which the theorem of Cramer and 
Rao oan be applied, and whose variance converges to the variance of the limiting distribu¬ 
tion. This procedure is also applied to the case in whioh there is no limiting distribution, 
and in which two sequences of distributions are considered which act as if they approach 
each other. 
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Sequential Analysis Abraham Wald. John Wiley anti Sons, Ino. pp. vi, 212, 
$4 00. 

Reviewed by M. A. Girshick 
Douglas Aircraft Company 

The development of sequential analysis as a new tool of statistics is by and 
large the work of Abraham Wald, This fact in. itself would make the appear¬ 
ance of a book by him on this subject an important event. However, Wald in 
this book did more than discuss the present status of sequential theory. He 
has, in fact, written a very lucid treatise on the general subject of statistical 
inference—a treatise which is likely to have great influence on statistical think¬ 
ing. 

While this book is not written for the mathematically untrained, a knowledge 
of differential and integral calculus will suffice to follow all the arguments ex¬ 
cept perhaps for some sections in the appendix where the more complicated 
proofs have been placed. 

The main body of this book is divided into 3 parts and 11 chapters. Tart I, 
covering chapters 1 to 4 inclusive, deals with the general theoiy of the sequential 
probability ratio test. Chapter 1 introduces in an elementary fashion the no¬ 
tion of probability distributions, tests of hypotheses and the Neyman-Pcarson 
theory of two-valued decisions based on a fixed sample size. In Chapter 2, 
the general notion of a sequential test procedure is introduced and the operating 
characteristics of such tests are discussed. Chapter 3 deals with the sequential 
probability ratio test for testing a single hypothesis against a single alternative. 
Here the boundaries of this sequential criterion arc expressed in terms of the 
risks, the operating characteristic and the average sample number functions 
are developed and bounds are obtained for the errors arising from truncation 
and neglect of excess over the boundaries. Chapter 4 presents a sequential 
theoiy for testing simple and composite hypotheses against a set of alternatives, 
ihe iundamental idea introduced is the concept of a weight function in the 
parameter space which permits handling composite hypotheses, or simple hypo¬ 
theses with many, alternatives, by means of the sequential probability ratio 
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Part II of this book, consisting of chapters 5 to 9 inclusive, deals with the 

SPeCiaI pr0blcms< 0ha P te « co^ains a 
inspection f i T Wlth ® pecific Terence to lot-by-lot acceptance 

rhZlT r ° 8p * ml , mteTest m ths chapter is the derivation of the exact 
charactenstic fuaction for a large class of tests and the development of upper 

ter ell it £ the l, ect 0f gl0Upin e on the 00 ABN curves Chap¬ 
ter 6 deals with the problem of double dichotomies. A procedure for testing 

d fference between the parameters of two binomial distributions is developed 
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for the fixed size as well as the sequential procedure. Chapters 7, 8, and 9 are 
concerned with the application of sequential analysis to the normal distribution. 
In these chapters the sequential probability ratio test is applied to hypotheses 
concerning the mean of a normal distribution when the variance is known, when 
the variance is not known (non-central t case) and hypotheses concerning the 
variance when the mean is known and when the mean is not known. 

Part III consists of two short chapters and deals with multi-valued decisions 
and sequential interval estimation. The results in these chapters are not de¬ 
finitive answers to the two outstanding problems in statistical inference but are 
merely suggestive of a possible approach to them. Nevertheless, from the 
point of view of stimulating future research these 2 chapters are perhaps the 
most valuable sections of this book. The reader, having been exposed in the 
previous chapters to various tests the outcome of which is a two-valued decision, 
is naturally led in Chapter 10 to the consideration of tests the outcome of which 
is a multi-valued decision. The notion of a risk function, introduced elsewhere 
by the author in the non-sequential case, is again used as the main tool in handling 
multi-valued decisions sequentially. In Chapter II the important problem of 
setting up confidence intervals of fixed length by means of a sequential proce¬ 
dure is discussed and a possible method for accomplishing this is indicated. 

As was previously noted, the mam theorems on sequential analysis are con¬ 
tained in the Appendix and since they have all been previously published in the 
Annals they will not be mentioned in the present review. The Appendix, to¬ 
gether with the main body of the book form a fairly exhaustive treatment of 
sequential theory. A notable exception to this is the lack of any mention of the 
published research on sequential point estimation. This is probably accounted 
for by the fact that this research came too late to be included in the book. Other 
minor omissions that may be noted are references to the generalization of the 
Fundamental Identity to more than one dimension and other theorems on 
sequences of functions of random vectors which have appeared in print. Also 
no mention is made of the similarity of sequential analysis to the problems of 
the random walk and the gambler’s ruin. This, in the opinion of the reviewer, 
is regrettable. 

This book will make a very suitable companion to the book Sequential Analy¬ 
sis of Statistical Data: Applications prepared by the Statistical Research Group, 
Columbia University (sec review by J. W. Tulcey, Ann. of Math. Stat. Vol. 
xviii, 1947), While there is some overlap in the material covered, the two books 
differ in emphasis, Wald's book, though not highly technical, is more in the 
nature of a textbook on the theory and application of sequential analysis. The 
SRG book on the othor hand, was prepared mainly for statisticians who may 
wish to use sequential analysis in practice. The latter book is therefore more 
detailed and puts less emphasis on the theoretical aspects of the sequential 
procedure, 

The book is surprisingly free of typographical errors which is a tribute to the 
high quality of the editorship. 
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Statistical Methods. George W. Snedecor. .'’ones, Iowa: The Iowa St,ate 
College Press, Inc , 1940; pp. xvi, 485, $4.50. 

Reviewed by Frederick Mostedler 
Harvard University 


Statistical Methods is a non-mathematieal treatment of modern experimental 
statistics. Few non-mathematical books are available that treat such topics 
as confidence limits, use of transformations, and analysis of variance and covari¬ 
ance in the detail presented by Snedecor. The examples are largely, but not 
entirely, drawn from agriculture and animal husbandry. The exercises for 
students are extensive and thought-provoking. 

Unlike most non-mathematical texts the book under review does not spend 
pages and pages on methods of recording frequencies and methods of computing 
countless moments which are seldom used in the later developments of the text. 
There is no long exasperating discussion of kurtosis and skewness; and there is 
no parade of qualitative Greek names for categorizing frequency distributions. 

The reviewer has used this book for teaching a second course in statistics to 
social science majors with reasonable success. The main disadvantage was the 
biological nature of most of the examples, but until some author writes a com¬ 
parable book using social science examples, the leviewer will continue to use 
Snedecor’s material for a large part of the course. 

The main differences between the Third and Fourth Editions of this text have 
been adequately summarized by Snedecor: 

“(i) greater emphasis has been placed on the theoretical conditions in which 
the various statistical methods have validity, and concurrently (ii) on the conduct 
of the experiment so as to incorporate in the data the information desired; (in) 
estimates and fiducial statements have been brought into equal prominence 
with tests of hypotheses; (iv) there is increased reliance cm experimental sam¬ 
plings to exemplify distribution theory; (v) the treatment of correlation and of 


experimental designs has been expanded; and (vi) the methods for dispropor¬ 
tionate subclass numbers have been extended to include all those necessary for 
ordinary needs.” Some more obvious changes in the Fourth Edition are the 
entirely new type and summaries which are included at the end of some of the 
chapters. The practice of using random sampling numbers (iv) to help explain 
theory has long been employed by teachers of statistics, but few authors have 
taken as much advantage of this technique as has Snedecor. In the Fourth 
Edition confidence intervals are widely used (iii). The author uses the adjee- 
rns confidence and fiducial” more or less interchangeably, but it is tho 
reviewer s opinion that it is the Neyman concept rather than the Fisherian that 
predominates. It should be remarked that this is one of the few texts that 
give the students the idea that in linear regression we do not predict y with the 
same accuracy for every * even when linearity and homoscedasticity hold (v). 

e main emphasis of the book is on the analysis of variance. The author 
succeeds extremely well in showing the student how to carry out the analysis 
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even at rather complex levels. On some other points he was not quite so suc¬ 
cessful. For example, the reviewer feels that the meaning of "interaction” was 
never gotten across, and that for the student the higher order interactions are 
still just tilings to be computed. Furthermore in attempting to make sure 
that the student understands how to do the computation the author often does 
not encourage the student to take any overall view of the data before blindly 
starting to compute. In addition, reasons for doing the experiment are some¬ 
times vague and the conclusions are often couched only in the jargon of analysis 
of variance. Therefore, the student seldom gets an opportunity to find out 
what kinds of recommendations might reasonably be made as the result of an 
experiment. Perhaps the worst example is on pages 275-280 Here the 
experiment deals with yield of wheat in 48 pots, with two series of soil treatments, 
humus and chemical. Anyone glancing over the results of the experiment will 
be startled to find that every yield fiom pots with "no humus treatment” (12 
observations) is greater than any yield with “humus treatment” (36 obser¬ 
vations). The reader will be further startled to find ihat all the evidence tends 
to support the notion that “no chemical treatment” is at least as fruitful as any 
of the chemical treatments tried. However, Snedccor says “The striking feature 
of this experiment is the discrepance among the subclasses The chemicals 
applied to one humus treatment produced yields out of accord with those from 
other lnimus treatments.” Snedecor then pushes on to a more subtle analysis. 
The reviewer feels that here as elsewhere in the book the author occasionally 
forgets that the extended analysis looks rather ridiculous unless the practicality 
of applying the technique is discussed. The example considered heie is one in 
which the point could profitably he made that everyone can sec from a visual 
examination of the data what the results of the experiment show. The analysis 
backs up the student’s common sense appraisal of the situation and gives him 
more confidence in and understanding of the method when it is applied in more 
delicate situations. It seems to the reviewer that too many times the appli¬ 
cation of the analysis of variance obfuscates the main point of the experiment. 
In the haste to get to the computations and the comparisons of interactions and 
errors the author frequently neglects to impiess the student with the funda¬ 
mental differences between means and their ultimate interpretation, However, 
the author does bring out clearly the notion of the various estimates of variance, 
a subject frequently neglected. 

In the next to last chapter the binomial and Poisson distributions are discussed. 
In this connection the inverse sine and the square root transformations are 
treated briefly, as is the logarithmic transformation It is surprising that no 
indication is given of the theoretical variances when the inverse sine and square 
root transformations arc used. The theoretical discussion of the transformation 
is limited to the remark that these transformations tend to make the variance 
independent of the means, but there is no indication of the further advantages. 
This is surprising because in a much earlier chapter the use of Fisher’s trans¬ 
formation for correlation coefficients was treated quite adequately. It seems 
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to the reviewer that in a later edition the use of transformation might well be 
moved forward in the book, and that the theoretical and practical implications 
might, be treated more thoroughly. 

As in most other texts the final chapter "Design and Analysis of Samplings” 
needs very considerable expansion. 

The book begins (Chapter 1) with a consideration of the sampling of attributes, 
inferences that can be drawn about the population, confidence limits, use of 
chi-square in a 1 x 2 table, and some discussion of the use of ratios, rates, and 
percentages. Measurement data is then (Chapter 2) discussed including the 
computation and application of the mean, range, standard deviation, probable 
deviation, median, and quartiles. The concepts of null hypothesis and confi¬ 
dence limits are introduced in Chapter 2 and elaborated in Chapter 3 which 
concerns sampling from a normally distributed population, random samples, 
distribution of the mean, variance, standard deviation, and of t. The com¬ 
parison of two groups in contrast to individuals is treated in Chapter 4 including 
groups with different numbers of individuals. Chapter 5 provides material on 
short cut methods of computation using calculating machines, code numbers 
are explained, suggestions about significant numbers and rates and percentages 
are given, and the use of the ratio range/sigma is introduced. 

After considering linear regression and correlation (Chapters li, 7) the author 
relates the two notions, and then goes on to consider some interesting special 
cases of correlation. Chapter 8 deals with largo sample methods. Chapter 9 
concerns enumeration data with more than ono degree of freedom, discusses 
adjustments of chi-square and its computation with large numbers of degrees of 
freedom, and describes the analysis of 2 x 2 x 2, R x 2, and R x C tables. The 
computation of the analysis of variance for two or more groups of measurement 
data and with two or more criteria of classification: variance ratio F, use of 
Latin square, analysis with disproportionate subclass numbers, and the use of 
randomized blocks are considered in Chapter 10 and 11, while analysis of co- 
variance is treated in Chapter 12 (22 pages). Multiple regression including 
partial and multiple correlation coefficients, tests of significance and confidence 
limits are handled in Chapter 13 and curvilinear regression considered in Chapter 
15. Chapter 16 deals with binomial and Poisson data, and Chapter 17 discussed 
the design and analysis of sampling, including sampling from a homogeneous 
or small population and the effectiveness of stratification. 

It seems to the reviewer that at the present time one would be hard put to 
find a better statistics text written at this level, 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Franz L. Alt, who has been with the Econometric Institute, New York, 
as Assistant Director of Research, is now Deputy Chief of the Computing Labo¬ 
ratory at the Ballistic Research Laboratories, Aberdeen Proving Ground, Aber¬ 
deen, Maryland. 

Mr A. George Carlton has accepted a position as Assistant Professor of 
Mathematics at the University of Illinois. 

Assistant Professor Paul R. Halmos, University of Chicago, Chicago, Illinois 
is on leave for the academic year. He is spending the year at the Institute for 
Advanced Study, Princeton, New Jersey on a Guggenheim Fellowship and 
will return to the University of Chicago in September, 1948, 

Mr. Henry F. Heblcy of the Pittsburgh Coal Co. spent most of last summer’ 
in Eastern Europe carrying out a survey on coal production and fuel availa¬ 
bility in Poland This work was carried out in the interest of the International 
Bank for Reconstruction and Development, 

Dr. Ilaiold D. Larsen, former Associate Professor at the University of New 
Mexico, has joined the faculty of Albion College, Albion, Michigan. 

Mr Dickson H. Leavens has resigned as Research Associate of the Cowles 
Commission for Research in Economics. He will continue as Managing Editor 
of Econometrica and may be addressed at 1632 Wood Avenue, Colorado Springs, 
Colorado. 

Professor S. B. Littauer, who has been Chairman of the Mathematics Depart¬ 
ment, Newark College of Engineering, Newark, New Jersey, has now accepted 
an associate professorship in the Department of Industrial Engineering, Columbia 
University. 

Professor Harris F. MacNeish, who has been Chairman of the Department of 
Mathematics at Brooklyn College since its foundation m 1930, has resigned 
to accept a visiting professorship in Mathematics at the University of Miami, 
Coral Gables, Florida 

Mr. Clifford J. Maloney has resigned a position as Research Associate in the 
Statistical Laboratory of Iowa State College to serve as Chief, Statistics Branch, 
Camp Detrick, Frederick, Maryland, an agency of the Chemical Corps of the 
United States Army. 

Mr. Monroe L. Norden, who has formerly been with the Ballistic Research 
Laboratories, Aberdeen Proving Ground, Maryland, has accepted a research 
position in theoretical or mathematical statistics at the Douglas Aircraft Co., 
Santa Monica, California. 

Mr, W. E. Pattee has resigned his position as statistical engineer with the 
Canadian Industries Limited, Skawinigan Falls, Quebec and has accepted a 
position as senior chemist, Ottawa Mill, E. B Eddy Company, Hull, Quebec. 
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Mr. Robert I. Piper, who was formerly plant staff assistant at the Southern 
California Telephone Company of Los Angeles, has been transferred to the 
systems office of the Pacific Telephone and Telegraph ('ompuny. He will assist 
in planning and analysing sampling surveys of the wages rates prevailing in 
the Pacific coast states in which the company operates. 

Mr. Herbert Solomon, who was formerly an instructor at the. College of the 
City of New York, has accepted an assistant professorship in the Mathematics 
Department, Newark College of Engineering, Newark 2, New Jersey. 

Dr. A G, Swanson, formerly an assistant chairman of the Department of 
Mathematics and Mechanics at the General Motors Institute, Flint, Michigan, 
has accepted an associate professorship in the Department of Mathematics, 
Gustavus Adolphus College, St. Peter, Minnesota. 


A federal center of applied mathematics—the National Applied Mathematics 
Laboratories—has been established as a division of the National Bureau of 
Standards. The new organization is oriented around modern mathematical 
statistics as applied to the physical and engineering sciences and to the develop¬ 
ment and use of modern high speed computing. The applied mathematics 
laboratories include four separate laboratories: the Institute of Numerical 
Analysis, the Computation Laboratory; the Statistical Engineering Laboratory; 
and the Machine Development Laboratory. 

Two members of the Institute have been given important positions in this 
organization. Dr. John Curtiss, who has been Director’s Assistant in Applied 
Mathematics at the Bureau of Standards, has been named Chief of the National 
Applied Mathematics Laboratories Dr. Churchill Eisenhart has been ap¬ 
pointed head of the Statistical Engineering Laboratory, 


Statistical Summer Sessions at the University of California, Berkeley 

Following the encouraging experience of last year the University of California 
offers statistical programs in the two Summer Sessions of 1948. The leaching 
staff is as follows: 

Raj Chandra Bose, Professor of the University of Calcutta, India. 

Miss Evelyn Fix, Lecturer at the University of California, Berkeley. 

Erich L. Lehmann, Assistant Professor of the University of California, 
Berkeley. 

Michel LohvE, Reader at the University of London, England. 

Jerzy Neyman, Professor of the University of California, Berkeley. 

Abraham Wald, Professor of Columbia University, New York. 

Courses in statistics aie offered on both the graduate and the undergraduate 
levels. The graduate courses, all given during the First Summer Session, June 
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to July 31, are meant primarily for students who either have already obtained 
their Ph.D. degree or are working towards it. Therefore, apait from formal 
classes, it is proposed to hold extensive seminars in which the work of students 
will be discussed. No specific prerequisites to graduate courses will be required. 
However, to benefit from the courses, the students must be generally familiar 
with the theory of statistics. In addition, course 272 and especially 271 will 
require a reasonable knowledge of the theory of functions. 

There will be two undergraduate courses offered, course S12 during the First 
Summer Session, June 21 to July 31, and course S113 during the Second Summer 
Session, August 2 to September 11. Both of these courses were recently in¬ 
troduced into the curriculum and are prerequisites to more advanced courses 
in statistics They are offered during the Summer Sessions for the benefit of 
students, otherwise advanced, who plan to attend more advanced courses in 
statistics during the fall semester. Besides, course S12 is recommended for 
students who do not intend to specialize in statistics but wish to acquire some 
knowledge of this subject as a part of their general education. 

The Statistical Laboratory will be available for students doing research. 

First Summer Session 


S12 Elements of Piobubility and Statistics 

271. Random Functions 

272. Sequential Analysis 

273. Design of Expoiiments 

S290s. Seminar in Theoiy of Statistics 
290t Seminar in Design of Experiments. 
S295. Individual Research. 


Mb. Leiimann 
Mn. Lohvis 
Mb. Wald 
Mb. Bose 
Mr. Lo&ve, Mb Wald 
Mb Bose 
Mb. Bose, Mb Lohv®, 
Mr. Neyman, Mb. Wald 


Second Summer Session 

S113. Second Course in Probability and Statistics. Miss Fix. 


Statistical Sessions at Alabama Polytechnic Institute 

Professor George W. Snedecor, President of the American Statistical Associa¬ 
tion and Research Professor of Statistics at Iowa State College, will be Visiting 
Research Professor of Statistics at Alabama Polytechnic Institute during the 
Spring .Quarter, from March 22 to June 4, 1948. Professor Snedecor will 
lecture on Statistical Experimental Design and will be available for statistical 
consultations. 

The newly formed Stastistical Laboratory at A.P I. will also offer a course 
in Survey Sampling during the Spring Quarter to be taught by the Director, 
Professor T. A. Bancroft. Conferences in applied statistics for research workers 
in the lower southeastern states are being scheduled during the time of Pro¬ 
fessor Snedecor's visit. 
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New Members 

The following persons have been elected lo membership in the Institute 
(September 1 to November 30, 1047) 


Afzal, M., M A. (Punjab, India) Graduate student at Columbia Uni v., 1038 John Jay Hall, 
Columbia University, New York 27, New York. 

Billeter, Ernest P., Pli.D. (Univ. of Basle) Scientific Assistant (Statistical Office, Zurich) 
Turnerslrasse 28, Basle, Switzerland, 

Bishop, David James, M Sc. (London) Head of Operational Research Section of British 
Iron and Steel Research Association, 11 Park Lane, London W. 1., England. 

Brooks, Hamilton, B.See (Univ. of Pittsburgh) Design Engineer, Wostinghouso Klectrio 
Corp,, P.O. Box 888 E. Pittsburgh, Pennsylvania. 

Craw, Alexander R., M S. (Univ. of Notre Dame) Instructor in Math., U. S. Naval 
Academy, Annapolis, Maryland. 

Edwards, Daisy M., AM. (Columbia Univ.) Lecturer in Statistics, University of 
London, Institute of Education, 1, Oakficld Court, Queens Road, I Vcybridge, Surrey, 
England. 

Havermark, K. Gunnar, Chief of Division, Royal Social Board, Lagcrlofsg 8, Stockholm, 
Sweden. 

Hollingsworth, Charles A., Ph.D., (State Univ. of Iona) Research Chemist, 801, Maple 
Ave., Waynesboro, Virginia. 

Hurd, CuthbertC., Ph.D, (Univ. of Ill,) Plant Statistician, Carbide and Carbon Chemi¬ 
cals Corp., Oak Ridge, Tenn. 

Isaacson, Stanley L., M.A, (Johns Hopkins Univ.) Graduate student at Columbia 
Univ , 2523 Loyola Soulhway, Baltimore, Maryland 

May, Kenneth, Ph.D., (Univ of Calif.) Assistant Professor of Mathematics, Oarlelon 
College, Northfield, Minnesota. 

Mirsky, Robert, AM. (Johns Hopkins Univ.) Graduate student at Columbia Univ., 
7 West 703th Street, Shanks Village, Orangeburg, New York. 

Mulhall, Harold, B.Sc (Sydney) Leoturer in Mathematics, Department of Mathemat¬ 
ics, University of Sydney, Australia. 

Palm, Conny, Ph D. (Stockholm) Docent, Ynglingar 11, Djursholm , Sweden. 

Pease, Katharine, A.M. (Smith College) Instructor in Psychology, Barnard College, 
Columbia University, Now York 27, New York. 

Peckham, Cyril G., M.S. (Univ. of Ill.) Assistant Professor of Mathematics, University 
of Dayton, Dayton 9, Ohio 

Peterson, Raymond P. f Jr,, B.A. (Univ. of Calif., Los Angeles) Assistant in Mathemat¬ 
ics, University of California, Los Angeles, Calif., 10729 Ashton Ave., Los Angeles 
21,, California 


Pike, Eugene W,, Ph.D., (Princeton) Member McFarlan, Groth <& Pike, 510 Audubon 
Ave., New York S3, New York. 

Pitman, Edwin J. G., M.A. (Univ. of Melbourne) Professor of Mathematics, Univ. of 
Tasmania, Hobart, Tasmania. 

Rigby, Fred D., Ph D., (Univ. of Iowa) Mathematician, Office of Naval Rosearob, P.O. 
, Box 234, Falls Church, Virginia. 

Smit \^i at r, aCe DeWitt ' Ph ' D ' ( Univ ‘ of Associate Professor of Statistics, Box 

2686, University, Alabama, 


rinivasan, T. K., M A (Madras) Assistant Lecturer, Mathematics Department, 
Raja s College, Pudukkottah, S-I-R, South India. 

T! r& i Ube !lr M v rga ^ P ", Control Analyst, 4124 Ivanresl Road, Qrandrille , Michigan. 

T y ° r '' :Univ ' of Ca,if '> Berkeley) Associate, School of Public Health, 
3042 Wheeler St , Berkeley, California. 
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Trlndade, Mario, Chief of the Statistical Division of the Institute de Resseguros do Brazil, 
Rua Senador Soares S3, ap. SOi, Rio de Janeiro, Brasil. 

Von Schelling, Hermann, Ph.D (Univ of Berlin) Naval Medical Research Laborato¬ 
ry, U. S. Submarine Base, New London, Conn. 

Whldden, Phillips, A.B. (Harvard) Part-time Instructor in Mathematics, Carnegie Insti¬ 
tute of Technology, Pittsburgh 13, Pa. 

Wolman, William, B.B A. (College of City of New York) Statistician, New York State 
Division of Housing, 295 Parkside Avenue , Brooklyn 26, New York. 

Woodbury, Lowell A., Ph.D, (Univ of Michigan) Assistant Professor of Physiology, 
Dept of Physiology, University of Utah Medical School, Salt Lake City 1, Utah. 

Yusuf, Mohammad, M.A. (Aligarh Muslim Univ , India) Graduate student at Columbia 
University, 20S, Furnald Hall, Columbia University, New York 27, New Yoil 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 

The thirtieth meeting of the Institute of Mathematical Statistics was hold in 
Berkeley, California on Monday and Tuesday, December 22 and 23,1947. The 
meeting was attended by approximately 70 persons including the following 31 
members of the Institute: 

G. A. Baker, G. G. Bcckstead, B. M. Bennett, It, U. Bonner, Francos L. Campbell, F„ L. 
Crow, Dorothy Cruden, W. J Dixon, R. Dorfman, G. Cl. Eldrcdge, E. A. Fay, Evelyn Fix, 
M A Girehick, J. Gurland, T. E.llama, W. L. Hart, J, L Hodges, Jr,, 1*. Ci Hoc], II, M. 
Hughes, T. A, Jeeves, H. S Komjn, G M. Kuznels, E L. Lehmann, R. B. Leipnik, J. Xey- 
man, Gladys Rappaport, H SehcfbS, T, W. Simpson, C M. Sirin, J. K. IValsh and II. 
Working. 

The Monday morning program, with Professor J. Neymnti presiding, consisted 
of the following contributed papers: 

1 . The Performance Characteristic of Certain Methods far Obtaining Confidence Intervals ■ 
Mr B M Bennett, University of California, Berkeley. 

2 Some Further Results on the Bernoulli Process 
Dr T. E Harris, Douglas Aircraft Company. 

3 . Most Powerful Tests of Composite Hypotheses I. X or mid Distributions. 

Dr. E L, Lehmann and Dr. C, M, Stein, University of California, Beikoley. 

4 . On the Selection of Forecasting Formulas. 

Professor P. G. IIoel, Univeisity of California, Los Angeles. 

The Monday afternoon program, with Professor II. Scheffd presiding, also 
consisted of contributed papers as follows: 

1 On the Power Function of the "Best” t-tesl Solution of the Behrens-Fisher Problem, 

Dr J E Walsh, Douglas Aircraft Company. 

2 On Sequences of Experiments 

Dr C. M Stein, University of California, Berkeley. 

3 . The Effect of Selection above Definite Lower Limits of Linear Functions of Xormally 
Distributed Correlated Variables on the Means and Variances of Other Linear Functions. 
Professoi G A. Baker, University of California, Davis. 

4 An Inversion Formula for the Distribution of a Ratio of Random Variables. 

Dr. J. Gurland, University of California, Berkeley. 

5 Independence of Paiameters and Sufficient Statistics. 

Dr E W Barankin, University of California, Berkeley. 

The Tuesday morning session, with Professor It. A. Gordon presiding, was 
devoted to the following invited and contributed papers on econometrics: 

1 , Remarks on the Theoiy of Indices 
, Professor G C. Evans, University of California, Berkeley. 

2 Interrelations of Theory and Statistical Reseat ch in Economics. 

PiofessorH Working, Stanford University 

3 Statistical and Case Methods m a Study of Labor Mobility. 

Professor D. McEntire, University of California, Berkeley. 

Discussion’ Dr M Lipton, University of California, Berkeley. 
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4 Distributions Associated unth Continuous Stochastic Processes 
Dr E B Leipmk, University of California, Berkeley 

5 On Some Methods of Evaluating Railway Costs (By title) 

Miss Evelyn Fix, University of California, Berkeley 

There was a dinner on Monday evening for members and guests at the Hotel 
Claremont and an informal discussion and coffee on Tuesday afternoon. 


REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 

The Tenth Annual Meeting of the Institute of Mathematical Statistics was 
held at the Commodore Hotel, New York City, on December 28-30,1947. The 
meeting was held in conjunction with the American Statistical Association. 
The following 173 members of the Institute were m attendance: 

F S Acton, It L Andeison, II E Arnold, L A Aroian, M. Astrachan, II M Baldwin, 
W D. Baten, R. E. Bechhofer, G \V Beebe, M II. Belz, A A Bennett, A. J. Berman, A. 
Blake, C I Bliss, P Bosehan, A II Bowkcr, A. E Brandt, T. H Biown,M A Brumbaugh, 
M C. Biuyeie, P T Bruycre, T A Budno.R.W Burgess, R. S. Burington, B II Camp, 
G C. Campbell, P G. Cailson, Ji , U. Chand, II. Chernoff, Kai-Lai Cluing, P C. Clifford, 
W. G. Cochran, D. D Cody, J Cm afield, G. M, Cox, J. II Curtiss, J, F. Daly, G B.Dant- 
siig, D. G, Delhi, H F. Dorn, A J Duncan, C W Dunnott, D. Duiand, J Dutka, P S. 
Dwyer, G L. Edgett, 0 Eisenlmit, B. Epstein, M. W Eudey, W D Evans, Will Feller, 
C D. Ferris, C. B. Fine, M. M Flood, L R Fianlccl, J E. Freund, B. Friedman, Hilda 
Goiringei, M. A Geisler, II II Geunond, M A Girshick, Abiaham Golub, C.H. Graves, 
S W Greenhouse, J A. Greenwood, T N E Grevillo, J. I Griffin, E T Gumbel, M. 
Gurney, K W. Halbert, Max Halpcrin, M II. Hansen, T. E Hams, B Haishbarger, 
Alex Hart, P. M Hauser, J D Ileidc, L II IIcibacli.M W Hnsch, Harold Hotelling, H 
M Humes, C. C. Hurd, S Jablon, C. M Jaeger, A. S Kaitz, Leo Katz, T. L Kelley, L S 
Kellogg, L. F Knudsen, A. IC. Kury, Jack Laderman, M LeLeika, Joseph Lev, Howard 
Levene, J E Lieberman, Julius Lieblem, S. B Littauer, Eugene Lukacs, Geo. A Lundberg, 
J, C. McPherson, Benjamin Malzbeig, Sophie Marcuse, E S Maika, II C Mathisen, J. 
W. Mauchly, A. L. Mayerson, Maigaiet Menell, E. B Mode, E C Molina, M, E. Moore, 
D. J. Monow, J. E Morton, JackMoshman, Hugo Muench, D N.Nanda,M G Natrella, 
Doris Newman, G. E Nicholson, Jr , Harold Nisselson, Nilan Norris, Ii. W. Norton, P S. 
Olmstead, A L. O’Toole, A E. Pauli, C. N Payne, Katherine Pease, M P. Peisakoff, E. 
W Pike, 0 A. Pope, G B. Pace, L J Reed, J S Rhodes, S F Robinson, A C Rosander, 
Finest Rubin, P, J. Rulon, Rose Sachs, Frank Saidol, Arthur Sard, M M, Sandomire, F. 
E Satterthwaite, E D. Schell, Bernice Scherl, 0. N Scrbein, R. G. Seth, Harry Shulman, 
Rosedith Sitgreaves, C. DeW Smith, G. W, Snedecor, Herbert Solomon, D. E South, 
Arthur Stein, G. T Steinberg, Joseph Steinborg, A. I Sternhell, S. A. Stouffcr, J V. Sfcur- 
levant, B. R. Suydam, W R. Thompson, Gerhard Tintnor, J W. Tukey, D. F Votaw, Jr., 
A. J, Wadman, H. M. Walker, Dzung-shu Wei, Sidney Weiner, Samuel Weiss, Sophie R. 
Wilkey, R, I. Wilkinson, S. S. Wilks, C. P. Wmsor, Jacob Wolfovritz, W. J. Youden. 

The first session, a joint session with the American Statistical Society, was 
held on the morning of December 28 and was devoted to the topic The Teaching 
of Statistics. Professor W G Cochran of North Carolina State College presided. 
A paper entitled Three Recent Reports Dealing with the Teaching of Statistics, 
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ihe Training of Statisticians and the Crisis in Statistical Personnel was present ed 
by Dr. James D. Paris of the Metropolitan Life Insurance Company. Many 
members participated in the general discussion which followed. 

The second session on The Teaching of Statistics also with the American Sta¬ 
tistical Association, was held at 1:15 P.M. Professor Francis G. Cornell of Ihe 
University of Illinois was chairman. The main paper of the session was the 
paper by Professor George W. Snedecor of Iowa State College entitled Syllabus 
for a Proposed Course in Basic Statistics. This was followed by prepared dis¬ 
cussion by: professors Elmer B, Mode, Boston University; Helen M, Walker, 
Teachers College, Columbia University; Samuel A. Stouffer, Harvard Uni¬ 
versity, and Albert E. Waugh, Department of Economics, University of Con¬ 
necticut. Many members participated in the general discussion. At the 
conclusion of this session, a film on Modem Quality Control was shown hy Mr. 
Simon Collier of the Johns Manville Company. 

Two Monday sessions, also held jointly with the American Statistical As¬ 
sociation, and with the cooperation of the Operations Evaluation Group of the 
Navy and the Operations Analysis of the Air Force, wore devoted to Operations 
Research. Professor Edward L. Bowles of Massachusetts Institute of Tech¬ 
nology presided at the Morning session. The following papers; 

1 Operations Research m the Department of the Navy. 

Dr. J Stejnhardt, Director, Operations Evaluation Group. 

2. Operations Research m the Department of the Air Forces. 

Dr. Leroy A. Brothers, Chief, Operations Analysis. 

were followed by discussion by Dr. Arthur A. Brown, Operations Evaluation 
Group, Dr. Thomas I. Edwards, Operations Analysis, Professor G. Baley Price, 
The University of Kansas and Wartime Operations Analyst and l)r. W. J. 
Youden, Douglas Aircraft Company and Wartime Operations Analyst, 

Dr. Merrill M. Flood, Assistant Deputy Director of Research, and Develop¬ 
ment, General Staff, U. S. Army, presided at the afternoon session. The fol¬ 
lowing papers were presented: 


1 . Operations Analysis m the Southwest Pacific Air War 

Dr^ Roger I. Wilkinson, Bell Telephone Laboratories and Wartime Operations Ana- 

2 . Operations Analysis of Air-Sea Rescue. 

Dr. E S. Lamar, Operations Evaluation. Group. 

3 . Factorial Chi-Square m Test Shooting. 

Opemtion S B 2Ta 1 l.yst TeChniCal Dire ° t0r ’ 0rdnancQ laboratory and Wartime 

4 . Mathematical Techniques of Program Planning. 

r George Dantzig, Consultant to the Air Comptroller, Headquarters, T.JSAF, 

A session on the Application of the Theory of Extreme Values was held jointly 

Jacob^WdSmtTof m Stl w ] t T ssociation on Tuesday, December 30. Professor 

UmVK5 ' ly «*»• The f „Ue™« 
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1 Introduction. The Mathematical Theory of Extreme Values. 

Professor Richard Yon Miseg, Harvard University, 

2 Applications to the Prediction of Flood Flows 
Professor Emil Gumbel, Brooklyn College. 

3 Applications to Meteorology 

Di Horace Norton, Weather Bureau, Washington, D. C 

4 Applications to Fiacture Problems 

Di Benjamin Epstein, Coal Research Laboratory, Carnegie Institute of Technology. 

The session concluded with discussion by Miss Marion Sandomire, Navy Depart¬ 
ment, Bureau of Ships and Dr Bradford Kimball, Port Washington, New York. 

A session on Statistical Techniques in Life Insurance was held jointly with 
the American Statistical Association at 1:15 P M., December 30. Mr. Robert 
J. Myers, Actuarial Consultant, Social Security Administration, was chairman 
of the meeting The following papers were presented: 

1 , Piobletns with Sampling Procedures for Reserve Valuations 

Mr Geoige G Campbell, Supervisor, Actuarial Division, Motiopolitan Life Insurance 
Company 

2 , Sampling Errors m Life Insurance Mortality and Other Statistics. 

Mi Donald Cody, Assistant Actualy, Equitable Life Assurance Society 

3, Recent Developments m Graduation and Interpolation 

Di.T. N, E Groville, National Office of Vital Statistics, U S Public Health Service. 

A session of contributed papers was held at 3:30 P.M. on December 30. Dr. 
T. N. E. Greville of the National Office of Vital Statistics presided. The fol¬ 
lowing papers were presented: 

1 . Distribution of the Circular Serial Correlation Coefficient for Residuals from a Fitted 
Fourier Series (Prelmnnai y Report ) 

Professor It. L. Andcison, North Carolina State College and Professor T. W. Ander¬ 
son, Jr,, Columbia University. 

2 . Same New Methods for Distributions of Quadratic Forms 

Professor Harold Hotelling, Institute of Statistics, University of North Carolina, 

3 . Frequency Functions Defined by the Pearson Difference Equation. 

Professor Leo Katz, Michigan State College, East Lansing. 

4 . Distribution of the Sum of Roots of a Determmantal Equation Under a Certain Condition 
Mr D. N. Nanda, Institute of Statistics, University of North Carolina. 

5 Applications of Carnap’s Probability Theory to Statistical Inference. 

Piofessor Gerhard Tintner, Department of Economics, Iowa State College 

6 . Circular Probable Error of an Elliptical Gaussian Distribution 
Dr. II. H. Germond, S. W. Marshall & Co., Washington, D. C. 

The annual business meeting of the Institute was held at 4:30 P.M., December 
29, 1947 in the ball room of the Commodore Hotel. There were reports by the 
President, Secretary-Treasurer, Mr Morris Hansen, Chairman of the Com¬ 
mittee on Planning and Development, and Dr. John Curtiss, Chairman of the 
Program Committee Mr Hansen presented a tentative form of the proposed 
new constitution while Dr. Curtiss discussed program plans. There was some 
discussion on these general questions from the floor 
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Professor A. Wald was elected President, and Dr. Churchill Eisenharb and 
Professor Henry Sehefie, Vice-Presidents. 


Paw* S. Dwyer, 
Secretary. 


REPORT ON THE CHICAGO MEETING OF THE INSTITUTE 


The thirty-second meeting of the Institute of Mathematical Statistics was 
held at the Sherman Hotel, Chicago, Monday and Tuesday, December 29-30. 
The meeting was held in conjunction with the one hundred fourteenth meeting 
of the American Association for the Advancement of Science and Co-operating 
Associated Societies. The following twenty-eight members of the Institute 
attended the meeting: 

W. Bartky, D H. Blackwell, G. M Brown, I. W. Burr, A. G. Carlton, M, CiustolIanoB, 
C W. Cotternwn, A T Craig, J. H. Davidson, R. C. Davis, W. E Doming, M. Elvcback, 
M L Garbuoy, W W. Gutzman, T. J. Jaiamillo, E. S. Keeping, T. C. Koopmnns, E. L. 
Lahti, M M Lavm, K, May, J A. Pierce, 0. Rmerfiol, II. Rubin, L. J. Savage, J. Silber, 
W. A Wallis, E. L Welker and J. W. Wilkins 

The Monday afternoon session was devoted to contributed papers of Section A, 
AAAS, and of the Institute, and to the Vice-Presidential address of Section A. 
The following papers were presented: 

1 . On the Boundary Layer Mohan along a Periodically Oscillating Plane in Compressible 
Viscous Fluids. 

Dr M Z. Krzywoblocki, University of Illinois, 

2 . Variations of the Probability of Unfair Election Results. 

Dr Kenneth May, Carleton College, 

3. Normal Equations with Nearly Vanishing Determinants . 

Dr. M Herzberger and Dr. R, Nonis 

4. Composition of Binary Quadratic Forms , 

Professor Gordon Pall, Illinois Institute of Technology 

5. A Proof of the Asymptotic Analogue of the Theorem of Crombr and Rao. 

Dr. Herman Rubin, Institute for Advanced Study, 

6 The Solution of Differential Equations in the Presence of Turning Points, Vice-Presi¬ 
dential address of Section A. 


The Tuesday afternoon session was also a joint session of Section A and the 
Institute, with Dean Walter Bartky of the University of Chicago presiding, 
ihe following two papers were presented upon invitation of tho Institute: 


1 . Application of the Radon-Nikodym Theorem to the Theory of Sufficient Statistics. 

rotesaor P R. Halmos and Dr. L J, Savage, University of Chioago. 

2 . unbiased Sequential Estimation. 

ProfessorDavid Blackwell, Howard University. 



REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1947 


The healthy growth of the Institute has continued through 1947, The 
membership increased from 900 to 1046. This increase is gratifying as a sign 
that more and more people appreciate the usefulness of basic theory and are 
ready to support research by making our Annals possible. It is is also pleasing 
to note that statistical theory and methodology are reaching new fields and 
that new groups as a whole are becoming conscious of the usefulness of contact 
with mathematical statistics. These developments are reflected in the meetings 
of the Institute. 

Meetings . The Ninth and Tenth Annual meetings (for 1946 and 1947) were 
held in the traditional way in conjunction with the meetings of the American 
Statistical Association (January—Atlantic City and Christmas—New York). 
The Tenth Summer Meeting was held with the American Mathematical Society 
and the Mathematical Association of America (September—Yale). Regional 
meetings were held in California (June—San Diego, December—Berkeley) and 
in Chicago (December), the latter in conjunction with the meetings of the 
American Association for the Advancement of Science (AAAS). Moreover, 
two meetings were organized with specialized programs of interest to groups 
with whom the Institute has not previously had much contact. A meeting 
in April at Columbia University, co-sponsored by the American Mathematical 
Society, was devoted to Stochastic Processes and Random Noise, and another 
meeting held simultaneously at Atlantic City was in conjunction with the meeting 
of the Eastern Psychological Association. It is clear that with such diversified 
meetings the Program Committee could not always act as a unit. J II. Curtiss 
was its Chairman and J. Neyman and J. W. Tukey arranged some of the pro¬ 
grams. Other members of the Committee were: C. W. Churchman, T. 
Koopmans, F. C. Mosteller, J. Neyman, H. Schcfl'4, J. Wolfowitz, and II. 
Working. 

At the Tenth Summer Meeting A, Wald delivered the first Plenry L. Rietz 
Memorial Lecture. It is desirable to preserve the solemnity of the occasion 
of the Rietz lectures and it was therefore decided that they should not be given 
every year. Accordingly, no Rietz lecturer has been selected for 1948. 

The Institute had no share in the program of the International Statistical 
Congress in Washington. However, Fellows of the Institute were invited to 
that Congress. This Congress and the Princeton Bi-Centennial were beneficial 
by establishing more intimate personal ties with our European colleagues. It is 
widely felt on both sides of the ocean that a closer cooperation, in particular 
with British statisticians, is highly desirable. Various suggestions in that 
direction were informally discussed in Washington and Princeton and M. G. 
Kendall has kindly consented to explore the practical possibilities. It is needless 
to say that the Institute is eager to do everything possible to promote cooperation 
and increase its usefulness also to our British colleagues. 
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Relations with other organizations It is gratifying to note that the cooperation 
of the Institute with sister societies is growing in intensity. The last, two Presi¬ 
dential reports mentioned plans for a reorganization of the American Statistical 
Association with a view'to more intimate relations among statistical societies, 
The revision of the constitution of the Association is not yet completed. It, ap¬ 
pears now that also the American Mathematical Society feels the need of closer eo- 
laboration with all groups interested in applied mathematics. It is too early to 
predict the results of these movements but it is clear that we, must devote eareful 
thought to our own organization and to our future relations willi other groups. 

In 1947 the AAAS organized an Inter-Society Committee, for the National 
Science Foundation Legislation. At the first meeting in Washington we were 
represented by J H. Curtiss and W. A. Shewhart and at the meeting in 1 )ecember 
in Chicago by W. Bartley. In ballots on the two controversial subjects the 
Institute voted against exclusion of social sciences and abstained on the f]uestion 
of patent rights. W. Feller represented the Institute on the Policy Committee 
of the American Mathematical Society. Through this Committee the Institute 
went on record as favoring the National Science Foundation Mill. < Hherwise 
the discussions of the Policy Committee were mostly connected with the es¬ 
tablishment of an International Mathematical Union. Clelvm 0. Oakley rep¬ 
resented the Institute on the Publicity Committee of the American Mathematical 
Society of which he is chairman. G. W. Snedecor was our representative on the 
AAAS Council, W. Bartky on the National Research Council, F. G. Mosteller 
and S S. Wilks on the Joint Committee for the Development of Statistical 
Application in Engineering and Manufacturing. In recent years the common 
interests of the Institute and the actuarial profession have grown in importance 
and it has been suggested that closer cooperation would be beneficial to both 
parts. A new committee has been established to explore these possibilities 
and in particular to arrange a joint meeting during 1948. Members of this 
committee are: G. G Campbell, T. N. E. Grevillc, C. Fisher, (1. Spoerl. 
Chairman. 


Internal Work. The growth of the Institute has rendered parts of the Con¬ 
stitution obsolete and a revision seems indicated In particular, it appears that 
the present system of elections is no longer satisfactory. The Institute is deeply 
indebted to its Committee on Planning and Development which has devoted 
much thought and consideration not only to a revision of the Constitution but 
also to the future development of the Institute as a whole. The membership 
had occasion to discuss the preliminary plans at two business meetings, M. H. 
Hansen acted as Chairman of the Committee; other members were: J. H. Curtiss, 
Walhs G0Chiai1 ’ J ' Neyman ’ IL w< Nor ton, F. F. Stephan, J. W. Tukey, W. A. 

A sharp increase in printing costs has, unfortunately, necessitated an increase 
n membership dues However, the membership should rest, assured that the 

7 0 “ R P T ° f th , 6 lDStitute is ratnnsicaU y sound, The cash prospects 
or 1948 are not rosy, but this is due principally to the necessity of reprinting 
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back-numbers of the Annals which m itself is a sign of health and promise of 
stability At present the Institute has a considerable reseive in back numbers 
and this reserve is rapidly being transformed into cash. We are also exploring 
the possibilities of new revenue and have started a campaign to get advertise¬ 
ments for the Annals, A possible campaign for institutional members is held 
in abeyance pending a clarification of our formal relations with sister societies. 
In order to make the Annals available in European countries with monetary 
exchange restrictions, the dues and subscriptions have been increased only for 
the Western Hemisphere. The investments of the Institute have been super¬ 
vised by the Finance Committee consisting of C. F. Roos, L. A, Knowler, F. F. 
Stephan, and Paul S Dwyer, Chairman 

Last year’s Committee on Teaching completed its work and submitted a 
detailed report which will be of great value It will be published m the Annals 
of Mathematical Statistics, The Committee has been dissolved with special 
thanks of the Board of Directors for their successful work. H. Hotelling was 
chairman and its members were Walter Bartley, W Edwards Demmg, Milton 
Friedman, and Paul Hoel The Committee on Tabulation under the chairman¬ 
ship of C- Eisenhart and consisting of Paul S Dwyer, H. Goldstine, A Lowan, 
H. W. Norton, and G R Stibitz has outlined the work for the coming years 
which promises to be of great interest 

The Membership Committee consisted of C. C. Craig, P G. Hoel, and J. H. 
Curtiss as Chairman. On its recommendations the following members were 
elected Fellows: T. W. Anderson, David Blackwell, Frederick Mosteller, Gerhard 
Tintner, Charles P. Winsor, Alexander Aitken, George Darmois, Ragnar Frisch, 
Robert C. Geary, and John Wishart. The Nominating Committee consisted of 
Meyer A. Girshiclc, Paul G. Ploel, Horace W Norton, Frederick Mosteller, 
and George W. Snedecor, Chairman A. Wald was nominated for President, 
and as an innovation four nominations for Vice-presidents were made: C. 
Eisenhart, A. M. Mood, Henry Schcffb, F F. Stephan. 

The Annals of Mathematical Statistics are covered by a special report of the 
Editor. However, it is appropriate to say that the Institute takes pride in the 
development of the Annals. While members see only its spectacular success, 
they should bear in mind that this is mostly due to the work of one man, S S. 
Wilks. In view of the great variety of interests of our membership and the 
many desirable directions in which the Annals could develop, it is clear that 
the work of the Editor can not always be pleasing and naturally often means a 
nervous burden. I feel sure that I speak for all our members in expressing the 
Institute’s sincere thanks to S. S. Wilks not only for his work but also for his 
wisdom in striking a sensible balance between many wishes and possibilities 
and leading the Annals so successfully in a direction satisfactory to all of us. 

In thanking all other members who have contributed to the work of the Insti¬ 
tute, it is hard to find appropriate words to express appreciation for the un¬ 
selfish efforts and devotion of our Secretary-Treasurei. Few members will 
realize how much of Dwyer’s time and thoughts are spent for the Institute 
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and how much the smooth running of the affairs of the Institute is due to his 
hard work. 

Finally, it is a pleasant duty to express our thanks and appreciation to Prince¬ 
ton University and to the University of Michigan. These Institutions have 
generously provided office space and other help which lias greatly facilitated 
our work and saved us expenses. 

Will Feller, 
President, 1947. 

December 31, 1947. 



REPORT OF THE SECRETARY-TREASURER OF THE 
INSTITUTE FOR 1947 


At the beginning of 1947 the Institute had 900 members and during 1947, 
210 new members (10 of which begin their membership with 1948) joined the 
Institute. During 1947 the Institute lost 73 members, 43 by resignation, 25 
by suspension for non-payment of dues, and 5 by death. The Institute has 
1,037 members as it starts 1948. 

The following members died during the year: 

Margaret J Dix 
Professor Irving Fisher 
Albert M Freeman 
Piofessor Henry A Ruger 
Piofessoi James G Smith 

A summary of the financial transactions of the Institute is given in the Fi¬ 
nancial Statement for 1947 which follows: 


FINANCIAL STATEMENT 

December 31, 1946 to December 31, 1947 


A Receipts 

Balance on Hand,* December 31,1946 

$7,241 55 

Dues . 

5,054.43 

Lite Membership Payments . . 

. . 287 50 

Subscriptions ..... . . .. 

2,892 93 

Sale op Back Numbers . . .... 

3,969.95 

Net Income prom Investments . . 

. , . , 63 00 

Miscellaneous . . . 

76.56 

Total , , ,. . 

$19,685 92 


B Expenditures 


Annals—Current 


Office of Editor, . , , 

$160 40 


Waveily Press , 

7,145.79 

$7,306.19 

Annals—Back Numbers 

Reprinted 600 copies each Vol III jffl & 2; IV #2; V #2, VII 
(f(4; XI (jfl & 4; XII jffl, XIV jffl, 2 & 3 . 

3,039 00 


Iowa City Ofliec . . 

143 75 

3,182.75 

Mathematical Reviews and Inter-Society for National Sci- 

ence Foundation . ... .... . 

. 

135.00 


* In bank deposits and goveinment bonds. 
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Office of the Secretary-Treasurer 


Printing, memoranda, etc (including some stamped envelopes > 
Postage, supplies, expiess, telephone calls and cables 

Clerical help . ., , ... 

1,100.4!) 

400.00 

1,502.31 

3,002.HO 

Miscellaneous. ... . 

Balance on Hand,* December 31,1947 


100 .HI 
6,888.37 

Total. . 


$19,585,92 

C. Summary of Receipts and Expenditures 



Balance on Hand,* December 31, 1946. 

Receipts during 1947 . ... 

Expenditures during 1947.. . 

Balance on Hand,* December 31,1947. . 


$7,241.55 
12,3-14.37 
13,727.55 
5,858 37 


D. Comparison of Assets on December 31, 1946 and December 31, 1947 

•te 


U S Government G Bonds . 

. $5,000.00 

$3,000.00 

Life Membership Funds,, 

1 , 888.00 

1,888.00-Bonds 


139,50 

427.00—-Bank Dep. 

Additional Bank Deposits .... 

214.05 

5-13.37 

Current Accounts Receivable . , 

452.02 

423.55 

Estimated Value (Cost) of back issues of Annals** ., 

7,234.58 

10,800.73 

Total 

. $14,028.75 

$17,148.05 

Net Gain 1947. 

. . . . , 

2,219.90 


E Liabilities of Institute of Mathematical Statistics as of December 31, 1947 

All bills which have been presented have been paid. The Life Membership Fund now 
contains $2,316 00 which covers 30 members. Also $3,348.11 has been paid in for 1948 
(and later) dues and subscriptions 


The increase in the size of the Annals from 500 to GOO pages and the phe¬ 
nomenal activity in the sales of back numbers are the two most important factors 
to be considered m comparing the 1947 statement with those of previous years. 
The Waverly Press bills for 1946 totalled $4,566.27 while the corresponding 
amount for 1947 was $7,145.79 an increase of 56%. The increase is attributable 
aot only to the increased size of the Annals but also to the fact that printing 
costs arc rising rapidly and, to a less extent, to the fact that we are printing a 
larger number of copies. It is to be noted that the cost of the Annals alone in 
m/ was over $2,000 more than the amount received frdm dues. As a result 
of the increase in dues, the 1948 report should be more satisfactory in this respect. 

, e phenomenal sales m back issues, noted in the report for 1940, were ac¬ 
celerated m 1947. We sold nearly $4,000 of back issues. These extensive 
sa es were embarrassing to our cash position since they exhausted many of our 
issues and the continued reprinting forced us to place a considerable portion of 


** Cost of Annals calculated at 67 cents per copy. 
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our reserves in inventory (some of which probably will not be returned to cash 
within decades). Eleven issues were reprinted during the first six months of 
1947 The resulting low cash position forced a temporary change in the policy 
of reprinting issues as they became exhausted 

It was necessary to cash two $1000 interest bearing G bonds to meet the 
Waverly and reprinting bills as they came due. These brought $1938.00 rather 
than $2000 as they have been valued in previous reports. As the income from 
bonds during the year was $125,1 have entered the net income from investments 
as $63,00 

An attempt has been made to keep down the costs of the office of the Secretary- 
Treasurer. The expense for 1947 was about $100 more than the expense for 
1946 and seems very satisfactory in view of the larger membership and greatly 
increased costs of all materials and services. 

For the reasons indicated above, the cash position (including bonds and Life 
Mcmbeiship payments) was lowered during the year by $1,383.18. This is 
compensated for by an mciease in the value of the stock of back issues (valued 
at cost) of $3,632.15. Some members of the finance committee feel that it is 
improper to list all of this stock as assets since we can probably sell only a portion 
of it in the next five or ten years. However, we did sell nearly $4,000 of Annals 
in 1947 and it is indicated (at the new prices) that the sales of issues we have 
now on hand will yield us $11,000 in the next five to ten years. 

Many of the issues which were stored in Iowa City have been sold and Pro¬ 
fessor Knowler has sent the remaining issues to Ann Arbor. I wish to acknowl¬ 
edge the work of Professor Knowler in caring for these issues and to express the 
appreciation of the Institute for his efforts over a period of years. I also wish 
to express my appreciation to Mr. Carl Bennett who contributed much time 
and energy m looking after the back issues at Ann Arbor. 

This report does not cover the amount of $390,20 which is held temporarily 
by the Institute for the fund for Annals for Countries Devastated by War. 
Arrangements are being made to purchase Annals for certain institutions which 
the Committee is recommending. 

Paul S. Dwyer, 
Secretary-Treasurer . 

December 31, 1947. 



REPORT OF THE EDITOR FOR 1947 


During the past year the increase in the number of manuscripts submitted 
to the Annals has continued. More manuscripts liavo been received from 
foreign countries than m any preceding year. During 1947 papers were pub¬ 
lished by authors in Argentina, Australia, Canada, England, France and Sweden. 
If manuscripts continue to be received at the present rate it will not bo. possible 
to publish them in the Annals without further expansion. The gap between 
receipts of manuscripts and publication is likely to become, serious by the end 
of 1948. The 1947 volume of the Annals contained 50 papers of which 25 were 
short notes. The total number of pages printed was 018, representing an 
increase of approximately 11 % over the size of the 1946 volume. It now appears 
that increased printing costs will prevent a further increase in the size of the 
Annals for 1948. It is therefore extremely important that authors submitting 
papers to the Annals make eveiy effort to keep their papers as brief aa possible. 

Contributions to probability and statistical theory are continuing to come 
in from a wide variety of fields. They were written by biologists, chemists, 
economists, mathematical statisticians, mathematicians and physicists, rep¬ 
resenting universities, government agencies and laboratories, business and 
industrial organizations. Some of these contributions are rather heterogeneous 
in quality of results and presentation. However, patient attempts are being 
made to have all papers with novel and interesting results suitably revised and 
published. Attempts to have expository papers prepared arc being continued. 

The Editor wishes to take this opportunity to acknowledge, on behalf of tho 
Editorial Committee, the generous refereeing assistance which has been given 
by the Mowing persons: L. A. Aroian, Z. W. Bimbaum, David Blackwell, 
A. H. Bowker, I. W. Burr, G. W. Brown, K. L. Chung, W. J. Dixon, T. X. E. 
Greville, F. E. Grubbs, J. B. S. Haldane, T. E. Harris, C. Hastings, L. Henkin, 
G. A. Hunt, B. F. Kimball, T. Koopmans, S. Kullback, E. L. Lehmann, H. 
Levene, H B. Mann, P. J. McCarthy, W. E. Milne, R. Otter, M. P. Peisakoff, 
HE, Robbins, L. J. Savage, F. F. Stephan, D. F. Votaw, and J. E, Walsh, 
e Editor is also indebted to the following persons at Princeton University 
for preparation of manuscripts for the printer, and other editorial and office 
assistance: Miss Jacqueline G. Foster, M. F. Freeman and J. E. Walsli. 


December 31, 1947. 


S. S, WlMtB, 

Editor, 
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CONSTITUTION AND BY-LAWS 
OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 


Constitution 

ARTICLE I 
Name and Purpose 

1 This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary- 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others, Junior 
members excepted, who have been members for twenty-three months prior to t]ie date 
of voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term 
as determined by the Committee on Membership and approved by the Board of Directors. 

ARTICLE III 

Officers, Board op Directors, and Committee on Membership 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one 
year and that of the Secretary-Treasurer three years. Elections shall be by majority 
ballots at Annual Meetings of the Institute. Voting may he in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31,1936. 

2. The Board of Directors of the Institute shall consist of the Officers, the two pievious 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows. At their first meeting subsequent to the adoption of this Constitution, the 
Board of Directors shall elect three members as Fellows to serve as the Committee on 
Membership, one member of the Committee for a term of one year, another for a term 
of two years, and another for a term of three years. Thereafter the Board of Directors 
shall elect from among the Fellows one member annually at their first meeting after their 
election for a term of three years. The president shall designate one of the Vice-Presi¬ 
dents as Chairman of this Committee. 

ARTICLE IV 
Meetings 

1, A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
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time as the Board of Directors may designate. Additional meetings may 1 h; called from 
time to time by the Board of Directors and shall he called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall 
be given to the membeiship by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall lie open to the 
public. Only papers accepted by a Program Committee appointed by the President 
may be presented to the Institute 

2. The Board of Directors shall hold a meeting immediately after their election and 
Again immediately befoie the expiration of their term. Other meetings of the Board may 
be held from time to time at the call of the President or any two memliers of the Board. 
Notice of each meeting of the Board, other than the two regular meetings, together with 
a statement of the business to be brought before the meeting, must lie given to the mem¬ 
bers of the boaid by the Secretaiy-Treasurcr at least five days prior to the date set there¬ 
for Should other business be passed upon, any memlier of the Board shall have the 
nght to leopen the question at the next meeting. 

3 Meetings of the Committee on Memliership may lie held from time to time at the 
call of the Chairman or any member of the Committee provided notice of such call and 
the purpose of the meeting is given to the members of the Committee by the Secretary- 
Treasurer at least five days before the date set therefor. Should other business be passed 
upon, any member of the Committee shall have the right to reopen the question at the 
next meeting Committee business may also bo transacted by correspondence if that 
seems preferable. 

4 At a regulaily convened meeting of the Boaul of Directors, four memliers shall 
constitute a quorum. At a regularly convened mooting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute, 
The Editor of the Annals of Mathematical Statistics shall be a Fellow appointed by the 
Board of Directors of the Institute. The term of office of the Editor may be terminated 
at the discretion of the Board of Directors. 

2. Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 

Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 
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By-laws 

ARTICLE I 

Duties op the Officers, the Editor, Board of Directors, and 
Committee on Membership 

1 The President, or in his absence, one of the Vice-Presidents, or in the absence of 
the President and both Vice-Piesidents, a Fellow selected by vote of the Fellows present 
shall preside at the meetings of the Institute and of the Board of Directors, At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings, 
of the Board of Directors he may vote in all cases. At least three months before the date, 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nominations 
may be submitted in wiiting, if signed by at least ten Fellows of the Institute, up to the 
time of the meeting. 

2 The Secretary-Treasurer shall keep a full and accurate record of the proceedings at 
the meetings of the Institute and of the Board of Directors, send out calls for said meet¬ 
ings and, with the approval of the President and the Board, carry on the conespondence 
of the Institute. Subject to the direction of the Board, he shall have charge of the ar¬ 
chives and otliei tangible and intangible property of the Institute and upon the direction 
of the Board he shall publish in the Annals of Mathematical Statistics a classified list of all 
Members and Fellows of the Institute. He shall send out calls for annual dues and ac¬ 
knowledge receipt of same, pay all bills approved by the President for expenditures 
authorized by the Board oi the Institute; keep a detailed account of all receipts and ex¬ 
penditures, prepare a financial statement at the end of each year and present an abstract 
of the same at the annual meeting of the Institute after it has been audited by a Member 
or Fellow of the Institute appointed by the President as Auditor. The Auditors shall 
report to the President. 

3. Subject to the direction of the Board, the Editor shall be charged with the responsi¬ 
bility for all editorial matteis concerning the editing of the Annals of Mathematical Sta¬ 
tistics, He shall, with the advice and consent of the Board, appoint an Editorial Com¬ 
mittee of not less than twelve members to co-operate with him; four for a period of five 
years, four for a period of three yeais, and the remaining members for a period of two 
yeais, appointments to be made annually as needed. All appointments to the Editorial 
Committee shall terminate with the appointment of a new Editor. The Editor shall 
serve as editorial adviser in the publication of all scientific monographs and pamphlets 
authorized by the Board 

4. The Board of Directors shall have charge of the funds and of the affairs of the In¬ 
stitute, with the exception of those affairs specifically assigned to the President or to the 
Committee on Membership, The Board shall have authority to fill all vacancies ad 
interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time to 
carry on the affairs of the Institute The power of election to the different grades of 
Membership, except the grades of Member and Junior Member, shall reside in the Board. 

5. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
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different grades of membership. The Committee shall review these qualifications periodi¬ 
cally and shall make such changes in these qualifications and make sueh recommendations 
with reference to the number of grades of membership us it deems advisable. The power 
to elect worthy applicants to the grades of Member and Junior Member shall reside in 
the Committee, which may delegate this power to the Secretary-Treasurer, subjeet to 
such reservations as the Committee considers appropriate. The Committee shall make 
recommendations to the Board of Directors with reference to placing members in other 
grades of membership. The Committee shall give its attention to the question of in¬ 
creasing the number of applicants for membership and shall advise the iSecretary-Tren- 
surer on plans for that purpose. 


ARTICLE II 


Dues 


1. Members shall pay seven,dollars at the time of admission to memtorship and ahull 
receive the full current vol ume of the Official Journal. Thereafter, Memtors and Fellows 
shall pay seven dollars annual dues. Honorary members shall lie exempt from all dues. 

A Sustaining Member shall pay annual dues of a multiple of one hundred dollars. 

An approved nominee of a Sustaining Member shall be a member in good standing 
without payment of dues for each year in which he is nominated provided that in that 
year he has been a member for less than three years 

(a) Exception. In the case that two Members of the Institute are husband and wife 
and .they elect to receive between them only one copy of the Official Journal, their dues 
shall each be reduced by twenty-five per cent. 

(b) Exception. Any Member or Fellow may make a single payment which will be 
accepted by the Institute in place of all succeeding annual dues and which will not other¬ 
wise altei his status as a Member or Fellow and will be based upon a suitable table and 
rate of interest, to be specified by the Board of Directors. 

(c) Exception, Any Member of Fellow of the Institute serving, except us a commis¬ 
sioned officer, in the Armed Forces of the United States, or of a friendly power, will, upon 
notification to the Secretary-Treasurer, be excused from the payment of dues until the 
January first following his discharge from service or his commissioning as an officer. He 
shall have all privileges of membership except that he shall not rcceivo the Official Journal. 

, owever, during the first year of his resumed membership ho may elect to receive one 
copy of each volume of the Official Journal published during the period of his service 
membership by paying one-half of the total of dues excused. 

(d) Exception. Anyone who resides outside the Western Hemisphere shall pay five 
dollars annual dues. 


2. Annual dues shall be payable on the first day of January of each year. 

1 d r l ar30f T thG “ l dues of cach Member and Fellow shall to for a subscrip¬ 
tion to the Official Journal Fifteen dollars of the dues of eaoh Sustaining Monitor shall 
be for two subscriptions to the Official Journal, and the binding of one copy. 

, n ;° r e . ac , h one hundred dollars of duos, a Sustaining Monitor shall to entitled 
to nominate two persons for membership in the Institute. 

mav to ST J® dUty ° f * he ^“W-Treasmer to notify by mail anyone whose dues 
_y six months m arrears, and to accompany sueh a notice by a copy of this article. 

notoe ttoT t0 r SUOh dues ^ three m0Qths i ciate : of mailing such 
notice, the Secretary-Treasurer shall report the delinquent to the Board of Directors. 
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The Board of Directors may strike the delinquent’s name from the rolls and withdraw 
all privileges of membership, and may reinstate the delinquent upon payment of aneais 
of dues. 


ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended m the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors 
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DISCRIMINANT FUNCTIONS WITH COVARIANCE 

Hr W. G. Cochran and C. I Burs 

Worth Carolina Stale, College; Connecticut Agricultural Experiment Station and 

Yak University 

1. Summary. This paper discusses the extension of the discriminant func¬ 
tion to the case where certain variates (called the covariance variates) are known 
to have the same means in all populations. Although such variates have no 
discriminating power by themselves, they may still be utilized in the discriminant 
function. 

The first step is to adjust the discriminators by means of their ‘within-sample’ 
regressions on the covariance variates. The disciiminant function is then 
calculated in the usual way from these adjusted variates. The standard tests of 
significance for the discriminant function (e.g. Hotelling’s T~ test) can be ex¬ 
tended to this case without difficulty, A measure is suggested of the gain in 
information due to covariance and the computations are illustrated by a numeri¬ 
cal example. The discussion is confined to the case where only a single function 
of the population means is being investigated. 

2. Introduction. Discriminant function analysis is now fairly well advanced 
for tho case where there are only two populations. The data consist of a number 
of measurements, called the discriminators, that have been made on each member 
of a random sample from each population. The technique has various uses. 
Fisher [1| used it in seeking a linear function of the measurements that could be 
employed to classify new observations into one or other of the two populations. 
Ho pointed out (2| that a test of significance of the difference between the two 
samples, developed from his discriminant, was identical with Hotelling’s generali¬ 
zation of Student's i test, discovered some years earlier [3]. Mahalanobis’ con¬ 
cept of tho generalized distance between two populations [4] was also found to 
be closely related to the discriminant function. In any of these applications— 
to classification, testing significance, or estimating distance—we may also be 
interested in considering whether certain of the measurements really contribute 
anything to the purpose at hand, and helpful tests of significance are available 
for this purpose. 

Recently the authors encountered a problem in which it seemed advisable to 
combine discriminant function analysis with the analysis of covariance. This 
case occurs whenever, in addition to tho discriminators, there is a measurement 
whoso mean is known to be the same in both populations. Suppose, for example, 
that the LQ.’s of each of a sample of students are measured. The sample is 
then divided al random into two groups, each of which subsequently receives a 
different typo of training. Measurements made at the end of the period of train¬ 
ing would be potential discriminators, but in the case of the initial I.Q.’s we can 
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clearly assume that there is no difference in the means of the populations cor¬ 
responding to the two groups. 

The initial I.Q. measurements are of course of no use in themselves in studying 
differences introduced by the training. Nevertheless, if they are correlated with 
the discriminators, they may serve m some way to ‘improve’ the discriminant: 
e.g. to increase the power of Hotelling’s T i test, or to reduce the number of errors 
in classification. This paper discusses the problem of utilizing such measure¬ 
ments, which will be called covariance variates. The problem is analogous 
to that which is solved by the analysis of covariance. In covariance, as applied 
for instance in a controlled experiment, variates that are unaffected by the 
experimental treatments can be used to provide more accurate estimates of the 
effects of the treatments or to increase the power of the F test of the differences 
among the treatment means. 

The procedure suggested is as follows. First, the multiple regression is ob¬ 
tained of each discriminator on all the covariance variates. These regressions 
are calculated from the Svithin-sample’ sums of squares and products: that is, 
from the sums of squares and products of deviations of the individual measure¬ 
ments from their sample means. Each discriminator is then replaced by its 
deviations from the multiple regression, and a new discriminant function is 
calculated in the usual way from these deviations. The extensions of Hotelling’s 
T 1 and Mahalanobis’ distance arc both obtained from this discriminant, though a 
further adjustment factor is needed for tests of significance. 

This paper is arranged in three parts, Part I presents a numerical example. 
The decision to place the example first was taken because most of the actual 
applications of the discriminant function in the literature appear to liavo been 
made by persons relatively unfamiliar with the theory of multivariate analysis. 
It is hoped that with the aid of the example readers in this class may be able to 
utilize covariance variates. For the same reason, the calculations have been 
presented as far as possible in terms of the operations of ordinary multiple re¬ 
gression, rather than in the form in which they first emerge from the theory. 
Actually, various equivalent methods of calculation are available, and it is not 
claimed that our method is necessarily the best. A mathematical statistician 
may prefer to follow the computing methods which come directly from theory 
(Part II, section 13). 

The example is more complex in structure than the two-sample case. The 
data constitute a two-way classification, in which the row means are nuisance 
parameters, being of no interest, while only a single linear function of the column 
means is of interest. It is well known that the ordinary l test can be applied 
not only to the difference between two sample means, but to any linear function 
of a number of 'sample means in data that are quite complex. Discriminant 
function technique can he extended in the same way, and readers familiar with 
the analysis of variance should find no great difficulty in making the appropriate 
extension to such data, 

Part II presents the theory. The reader who is primarily interested in theory 
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should read Part II before Part I Since the approaches used by Mahalanobis, 
Hotelling and Fisher all converge, we have chosen that of Mahalanobis, mainly 
because the extension of his techniques to include covaiiance variates seems 
straightforward, Maximum likelihood estimation of the generalized distance 
is presented in full for the two-population case. The frequency distribution of 
the estimated distance and the extension of the T 2 test are worked out. An 
attempt is also made to obtain a quantity that will measure what has been gained 
by the use of covariance. 

In order to illustrate how the theory applies with other types of data, the 
mathematical model is given for the row by column classification that occurs in 
the example. The major results for this model are indicated, though without 
proof. 

In Part III it is shown that the computational methods used in the example 
are equivalent to those developed by theory. While this can easily be verified 
in a particular case, it is not intuitively obvious. 

PART I Numerical Example 

3. Description. The data form part of an experiment on the assay of insulin 
of which other parts have been published [5]. Twelve rabbits were used. 
Each rabbit received in succession four doses of insulin, equally spaced on a log. 
scale. An interval of eight days or more elapsed between successive doses, and 
the order in which the doses were given to any rabbit was determined by random¬ 
ization. Thus the experiment is of the ‘randomized blocks’ type, where each 
rabbit constitutes a block and there are 12 blocks with 4 treatments each. 

The effect of insulin is usually measured by some function of the blood sugar 
of the rabbit in periodic bleedings after injection of the insulin. The blood sugar 
was measured for each rabbit at 1,2, 3, 4, and 5 hours after injection, and also 
before injection. In order to simplify the arithmetic, only the initial blood sugar 
and the blood sugars at 3 and 4 hours after injection will be considered here. 
These data are shown for the first three rabbits (with totals for all 12 rabbit ) 
in Table I. 

Let Zi W i be a typical observation of blood sugar, where i = 3, 4 stands for the 
hour after injection, w for the rabbit and d for the dose. The mathematical 
model to be used is as follows. 

(1) X{ m = yu.f -j— piw + Til + /3<o(#0«» ~ -To■ •) T e <w« • 

The parameters m , piw and yu represent the true mean and the effects of rabbit 
and log dose respectively. The quantity kou>» is the initial blood sugar for the 
rabbit w bofore the test at dose z, while xo . is the average initial blood sugar over 
the whole experiment. The blood sugar at i hours has been found experimentally 
to be correlated with the corresponding initial blood sugar, and the relationship 
is represented here as a linear regression, with fto as the regression coefficient. 
The residuals e, wl are assumed to follow a multivariate (in this case bivariate) 
normal distribution, with zero means. The covariance between e lin , and e iat 
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is taken as a,,-, o. The model is the standard one for the ordinary analysis of 
covariance, except that we have two measures of the effect of insulin, .t 3 and 2 , . 

One additional assumption was made. For all post-injection readings the 
blood sugar seemed linearly related to the log dose l, . Since this result has been 
found in other experiments, we assumed that 


y<> - S,t, 

where 5, is the regression coefficient of blood sugar on log dose. 

4. Object of the analysis. Our object was to find the linear combination of 
the three blood sugar readings that would measure best the effect of the insulin. 
Because of the linearity of the regression on log dose, the effect of insulin on each 


TABLE 1 

Sample of original data on blood sugar levels in insulin experiment 


Rabbit 

No. 


Total*. 


Log doso 


Initial blood sugar x 0 

Thrao hours 

.32 

47 

.62 

77 

.32 

.47 

02 

75 

94 

107 

.94 

95 

76 

07 

91 

86 

83 

93 

98 

90 

77 

97 

99 

90 

91 

84 

70 

59 

1065 

1074 

112ljl070 

932 

872 

731 


.77 

50 

Of) 

48 


Four hours xt 

! .47 ' ,02 ' '.77 

90j Ofi, 115 91 

104; 87, 90: 89 
93 102' 85! 90 


*12 rabbits. 


Xi is known completely if the slope S, is known, It seems reasonable to choose 
e linear compound of the s.’s which will give the maximum ratio when its 
estimated regression on log dose is divided by the estimated standard error of 
this regression We now consider how to obtain this maximum. The argument 
given below is not intended to prove the validity of the method, for which refer¬ 
ence should be made to Part II, 

The true regression of the original blood B ugar x 0 on log dose is known to be 

, G a ' ,! ’ 18 clear that the variate is useful only in so far as it enables 

estimatZ Tf T"* ° f 53 ^ 1 F ° r this P ur P°«« we 

estimate the effect of a* upon a 3 and ,t 4 , the blood sugar readings at 3 and 4 hours 

r 0i " 9Uli T ? ditoces -bbii:. From the 

s andard theoiy of covariance the best estimate is the regression coefficient 

<0 10 /Ew, where E denotes a sum of squares or products calculated from 

nroducts ofT ? !?' COVai ' ianoe ’ tha * is from the sums of squares and 

p«,s ““ * fl0m the itted OP row column 
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The regression of the blood sugar at each hour on the log dose of insulin is 
calculated from totals adjusted for the regression on x a . Since the 4 successive 
log doses (z = 1, 2, 3, 4) are spaced equally, they may be replaced in the com- • 
putation by the coded doses —3, —1, +1, and +3. If we let T tl be the total 
blood sugar, summed over 12 rabbits, at the ith hour with dose z, the following 
result is well known for the analysis of covariance. The best estimate of 
Si(i = 3, 4) is 

[(-3 Ta - 2\i 4 4 32\0 - b« (-3foi - T m + T K + 32^)1/240 

The divisor, 240, is 12(3 2 4 l 2 4 l 2 4 3 2 ). The expression may be written 

d[ _ (d, — 5,o do) 

240 240 ’ 

where 


d, — - 37\i — 2\2 4 T,} 4 3 T,i. 

A linear combination is formed from d 3 and d{, the numerators in the best 
estimates of S 3 and S t , by means of the coefficients L 3 and , L 3 and L.\ are 
computed so as to maximize the ratio of 

di = L 3 d 3 4 L a d 4 

to its estimated standard error. 

From the definition of d \, this requires a discriminant of the form 

I 4 1 hio^oujz), 

where each is measured from its mean. 

We require next the estimated standard error of d s , This depends, in turn 
upon the variances of da and d{ and their covariance. As usual in the analysis, 
of variance we have 

(5) V(d' } ) = V(d 3 ) 4 d 2 7(&3o) = ^3 3 .o (240 4 ~j ■ 


The residual variance 033,0 is estimated from the sums of squares and products 
in the error row of the analysis of covariance as 

Sjl.O = ®33 o/w = (®33 — ®3o/I?0o)Ab 

where n is the degrees of freedom in each E„ diminished by one. Similar methods 
lead to the variance of d[ and to the covariance of ds and . It follows that the 
true variance of d, may be written 


( 0 ) 


V(dj ) a La 033.0 4 2 L 3 L 4 <rm,o 4 L\ v«.o , 


where the factor (240 4 f'-) in equation (5) is omitted since it does not involve 
\ Aoo/ 
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the L’s. Similarly, the estimated variance of d, , apart from constant factors, 
may be written as 

(7) LsE-jj.a + 2L 1 L<E 3 ,. q -f- iAEuj,. 

The quantity to be maximized is therefore 

_ {Lsdi + Ij\ dj) 

V: i. -i- 2i. /../•„ .. - i.\r: i n' 

Formally, this is the same typo of quantity that is maximized in ordinary analysis 
with the discriminant function. Differentiation with respect to the L \s leads 
to the equations (after omission of another constant factor) 

( 8 ) Em.qLi + ^31.0^4 — dz , Ez\, 0 Lz -f- Eu&E\ — . 

The objective of the computation, therefore, is to obtain discriminant coefficients 
having the same ratio to each other as and U in equations (8), As will be 
shown in the next section, this can be accomplished in practice more conveniently 
by substituting an alternative set of three simultaneous equations for the two 
in equations (8). 


5. Calculations. The first step ds to form the sums of squares and products 
in the analysis of covarianco. With 12 rabbits and 4 doses, the conventional 
breakdown of eaoh total sum is into components for rabbits (U d.f.), doses 
(3 d.f.) and rabbits X doses (33 d.f.). Because of the assumed linear regression 
on log dose, the sum of squares for doses was further divided into two com¬ 
ponents The first (1 d.f.) is the contribution duo to this regression. For a;,-, 
the sum of squares due to regression is d?/ 210 , 0 r in the case of x», (U01) 9 /24o’ 
or 5645. The remaining component, (2 d.f.) is called the curvature, since it 
measures the effect of deviations from the linear regression. The sum of squares 
for curvature is found by subtraction. 

The following points may be noted, (i) For both x s and x,, the F ratio of the 
curvature mean square to the rabbits X doses mean square will be found to be 
less than 1, so that the data do not suggest rejection of the hypothesis of a linear 
regression on log dose, (n) The F ratios of the regression mean squares to the 
rabbits X doses mean squares are highly significant, being 57.8 for x 3 and 28.7 
01 Xl 1 . “ s * n dicates, incidentally, that the threo-hour reading may be a more 
responsive measure of the effect of insulin than the four-hour reading, (iii) With 

xo, the b ratio does not approach significance for either the regression or the 
curvature, as is to be expected, 

°° nsequeIlcs of the assumption of linear regression on log doso is that the 

V :r t m ri squares and produot3 are estimatea of tbo same quantities as 
nne rabbits X doses mean squares and products. Consequently the lines for 

dl* x d r w* 2 rs *!f b “ s 

error sums of squares or products, E 31 , etc. We decided, however, to estimate 
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the error only from the 33 d f. for rabbits X doses. This was done because it 
seemed to facilitate a test of the curvature of the final discriminant I. (This 
test will not be reported here.) 

The L'a could now be obtained from equations (8). In this case the first 
equation would contain the terms 

E u .o = 3223 - (1259)V2351; But = 1200 - (1259) (1340)/2351; 

d' 3 = d 3 - budo = —1164 - ) 62, 

leading to the simultaneous equations 

2548.8 Li 4- 482.4 L 4 = -1197.2 
482.4 U + 2373.2 U = - 844.3, 
which give UjL, = -.41848/-.27070 = 1.5459. 

TABLE 2 


Sums of squares and products 


Component 

(1 f 

2 

*0 

4 

X* 

BoTi 

XQT4 

XJX4 

Between rabbits . .. 


886 

9376 

11165 

1952 

2477 

9206 

Between closes. 


168 

5806 

2810 

-247 

-98 

3981 

[Reg. on log dose. . 


16 

5645 

2727 

-301 

-209 

3924 

(Curvature . 


152 

161 

83 

54 

111 

57 

Rabbits X doses. 


2351 

3223 

3137 

1259 

1340 

1200 

Total. 

47 

3405 

18405 

17112 

1 

2964 

1 

3719 

14387 


Instead of using these equations, we propose to solve alternatively the set of 
three equations 

SooLq -f" (S 03 L 3 -J- SmLi = do ’ 

(9) S 30 L 0 "t~ S 33 L 3 + S 31 L .1 = do 

S 40 L 0 + Si 3 L 3 + SuL l = d\, 

where each S ti (i = 0 , 3, 4) is the sum of squares or products formed by adding 
the error line in the analysis of variance to the line for regression on log dose. 
Thus S,/has 34 d.f. The ratio of L s to L ( , as found from equations (9), is exactly 
the same as that found from the original equations ( 8 ), as is proved in section 18. 
Further, the solution of the new equations seems to be more useful for performing 
tests of significance, as will appear in following sections 
Acdordingly, the first step after forming the analysis of variance is to set up 
the three equations (9). 
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The equations are solved by means of the inverse matrix. The values of ch¬ 
on the right side of the equations are replaced successively by 1, 0, 0 by 0, 1, 0 
and by 0,0,1 to obtain the three sets of values for La , La and Li . These results 
are given in the first three columns of Table 4 and arc designated as c,-,-. 

The L’s follow from the dj by the usual rule for regressions. For example, 

U = {(.003209)(62) + (.227781)(-1164) + (— .199655)(—809)} *10" J - 

-.103417 


TABLE 3 

Equations for determining La and Ih 


23G7L 0 + 958L 3 + 11311,4 = 62 

958L 0 + 88 G 8 L 3 + 51241,4 = -1104 
1131L. + 5124L 3 + 58G4L 4 = -800 


The composite response or discriminant, adjusted for the covariance variate, is 
now taken as 


or 


T ( Eaa 

= L i ~ it 

\ /loo 


% 




-.103417 


/ 1259 

V* 2351 10 


.066883 



1340 \ 
2351 *7 


= .093503*0 - .103417*3 - .000883*4- 

Note that the value of L 0 is not used at this stage and that /, 3 /Li - 1.540 agrees 
with the value found from equations ( 8 ). 


TABLE 4 


Inverse matrix (X /0 s ) and L’s 



( 10 s on) 


.465408 

.003209 

- .092568 

.003209 

.227781 

-.199655 

- .092568 

-.199655 

.362846 


cli 


62 

-1104 

-809 



Li 


.100008 
-.103417 
-.000883 


A similar method may be followed when there arc moro discriminators or more 
covariance variates. With two covariance variates, * 0 and sj, for instance, tho 
adjusted discriminant would be 


La(xa 630*0 — 630*0) + Li (*4 — 640*0 — 640*0) 

where b so , 6 ), are the partial regression coefficients of * 4 on * 0 , *J respectively, 
determined from the error line, and similarly for * 4 . Further, since any linear 
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function of the column (dose) means may be represented as a regression on some 
variate t , , this method may be applied to any linear function of the column means 
in which we are interested, provided that the mathematical model is appropriate. 

6. Test of the regression of the adjusted discriminant on log dose. The 
numerator of the regression of I on the coded doses is 

di = L 3 (d 3 — ftaodo) d - Li{dt — 5ic>do)i 

Since the regressions of x 3 and aq on the coded doses were both significant, it 
may be confidently expected that the regression of I will also be significant. The 
test of significance will, however, be given in case it may be useful in other appli¬ 
cations. For those who are familiar with multiple regression, the test is perhaps 
most easily made by means of a device due to Fisher [2]. 

Construct a dummy variate y mz such that y m is always equal to t ,, or in our 
case to the coded doses. That is, y takes the value —3 for all observations at 


TABLE 5 

Analysis of if and yx. 



d f. 

y a 

yxi 

Rabbits ,,,, , . 

11 

0 

0 

Doses . . 

3 

240 

d, 

Regression on log (lose. . . 

1 

240 

d, 

Curvature. 

2 

0 

0 

Rabbits X doses = error . . . 

33 

0 

0 

Sum = Error plus reg, on log dose 

34 

240 

di 

Total. . . . 

47 

240 

d ( 


the lowest dose level, and —1, +1, and +3 respectively for all observations at 
the successive higher dosage levels. We shall show that equations (9) solved 
in finding the L’s are formally the same as a set of normal equations for the linear 
regression of y on xo , x 3 , and x\ . 

The following analysis for if and yx z may easily be verified, 

It will be noted that the sum of products of y and Xi in the sum line is d,. 
Further, Si/ is the sum of products of x , and x, for this line. It follows that the 
normal equations for the regression of y on the x'b, as calculated from the “sum” 
line, are 

SioLo -f- Siil/3 + SaLu = d, (i = 0, 3, 4). 

These are just the equations solved in obtaining the L’s. Consequently, L t 
and Li are the partial regression coefficients of y on x 3 and Xi . A test of the null 
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hypothesis that the true values of L, and Li are both zero can bo made by the 
standard method for multiple regression, as will be shown later from theory. 
This test is equivalent to a test of the hypothesis that the true value of d, is zero. 

To apply the test, we require three items in the analysis of variance of y. 
First, the total sum of squares for the Sum line, already seen to be 210 (Tabic 5). 
Second, the reduction due to a regression on all variates (covariance variates 
plus discriminators). By the usual rules for regression, this is (from Tallin 4) 

+ Lid, + Ldi = (.100008) (62) + (-.1034l7)(-1101) 

+ (-.0068811) (-809) = 180.69. 

Finally, we need the reduction due to a regression on the variates that aro not 
being tested, i e. on the covariance variates alone. From Table 4, the reduction 

TABLE 0 


Analysis of variance of dummy variale y 



d. f. | 

R. S. 

M, R. 

Reduction to regression on covariance variates. 
Additional reduction to regression on dis- 

1 

i 

1 

1.62 


criminators. 

2 

179.07 

89.51 

1.913 

i 

Deviations . 

31 

59.31 

Total (from Sum line) . 

1 " "j 

i 34 i 

240.00 

!' 

1 


in this case is simply dl/Sw or (62) 2 /2367, or 1.62. The difference, 180.09 — 1.62, 
represents the reduction due to the regression of y on L% and In , after fitting a; 0 , 
The resulting analysis is given below, the degrees of freedom being apportioned 
by the usual rules. 

The F ratio, 89.54/1 913, or 46.80, with 2 and 31 d.f,, is used to test the null 
hypothesis that the adjusted discriminant has no real regression on log dose. 

7. Test of particular discriminators. Another useful test is that of the null 
hypothesis that a particular discriminator, or group of discriminators, contribute 
nothing to the adjusted discriminant. In other words, this is a test of the null 
hypothesis that the true values of a subset of the L’s are all zero. The test is of 
.interest in the present investigation, since it would be useful to know whether 
all five hourly readings of the blood sugar are really helpful. As might bo 
expected by analogy with the previous section, the test is made by calculating the 
additional reduction due to the regression of y on the particular subset of the 
L’s in question. 

The test will be illustrated with respect to Li . One method of making the 
test is to re-solve the normal equations with L< omitted. From this solution 
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the reduction due to a regression of y on x 0 and x 3 alone is obtained. The addi¬ 
tional reduction due to a regression on Xi is found by subtraction from 180 69. 

However, the additional reduction can be found directly from the well-known 
regression theorem that it is equal to Ll/c M . The c's have already been found 
in Table 4. The result is (,066883)70000362846), or 12.33 This value is 
tested against the residual error mean square of 1.913, F having 1 and 31 d.f. 
The contribution is found to be significant. 

In fact, by this process a kind of estimated standard error can be attached to 
each of the L's for the discriminators, using the formula s\/c^ , where s is the, 
residual root mean square. Thus for L 3 , (—.103417), the ‘standard error’ is 
V(1.913)( 00022778J), or .0209. It should be stressed that at this point the 
analogy with regression is rather thin. The L’s are not normally distributed, 
nor do the estimated standard errors follow their usual distribution. It is, 
however, correct that if the true value of L 4 is zero, L \/sVcii follows the t distri¬ 
bution with 31 d.f. Thus, if omission of some discmninators seems warranted, 

TABLE 7 


Analysis of variance for regression of y on the discriminators 



d f. 

S S. 

M. S 

Regression. 

2 


79 60 

Deviations. 

32 


2.525 

Total.. .... 

34 

240.00 



these t ratios are relevant in deciding which variate to eliminate first. Strictly 
speaking, the c’s should be re-calculated after each elimination before deciding 
which other discriminators might also be discarded. 

8. Estimation of the gain due to covariance. The tests given above enable us 
to state whether the discriminators contribute significantly, in the statistical 
sense. It is also of interest to investigate what has been gained by the use of 
the covariance variates. From the practical point of view, the question: "What 
is the gain from covariance?” might be re-phrased as: “If x 0 is ignored, how 
many rabbits must be tested in order to estimate the regression on log dose as 
accurately as it was estimated with the adjusted discriminant for 12 rabbits?” 

The theoretical aspects of the question are discussed in section 16; the calcula¬ 
tions are described here. The only new quantity needed is the F ratio for the 
regression of y on the discriminators alone. This can be obtained by a new 
solution of the normal equations, this time with the covariance variates omitted. 
With just one covariance variate, it is quicker to use the fact that the additional 
reduction to the regression of y on xa , after fitting x 3 and aq, is Lo/coo, 
or (.100008)V(-000465408) or 21.49. Consequently, the reduction due to a 
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regression of y on x% and %\ alone is 180,69 — 21.49, or 159.20. The F ratio, 
79 60/2.525, is 31.52, whereas the F ratio with covariance is 46,80 (from 
Table 6). The quantity suggested from theory for comparing the two techniques 
is 

(w» - 2 )F _ t 

n 2 

where n 2 is the number of d.f. in the denominator of F. These values are 
, ((30 X 31.52/32) - 1) or 28.55 with no covariance and ((29 X 40.80/31) — 1) 
or 42.78 with covariance. The relative information is estimated as 42.78/28.55, 
or 1.50, so that the use of covariance gives 50 per cent more information. In 
other words about 18 rabbits would be needed if the initial blood sugars wore 
ignored. To a slight extent this estimate favors the covariance analysis, since 
it ignores the increased accuracy that would accrue from the extra error d.f, if 
18 rabbits were used without covariance. 

PART II Thkoiiy 

9. Notation. The thoory will bo given first for the two-population case- 
We suppose that a random sample of size N has been drawn from each popula¬ 
tion. A typical discriminator is written x iwa and a typical covariance variate 
Xfua , where 

i, j — 1,2, • ■ ■ p denote discriminators, 

£, it = 1, 2, • • • h denote covariance variates, 
w = 1, 2 denotes the population, and 
a - 1, 2, • ■ ■ N denotes the order within the sample. 

The population mean of x tloa is , and the corresponding sample mean ia x (u> . 
The difference (jm — mu) is .denoted by 5; and the corresponding estimated 
difference (x’«. — x a .) by d,. 

10. Discriminant functions and generalized distance. Since we propose to 
approach the theory by means of the generalized distance, it may be well to 
review briefly the relation between the discriminant and the generalized distance. 
In the ordinary theory (with no covariance variates) it is assumed that the var- 
iates x, wa follow a multivariate normal distribution, and that the covariance 
matrix crj, between X{ wa and Xj Wa is the same in both populations. The gener¬ 
alized distance of Mahalanobis is defined by 

pA 2 = a' 1 di Sj, where («r <y ) ~ {tr { j)~K 

In order to estimate this quantity from the sample, we first calculate the mean 
within-sample covariance s tJ , where 

' 2 N 

(11) s » = E Z (s,™ - z.vXaw - Xj W ,)/2{N - 1), 
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The estimated distance is then taken as 

(12) pH 2 = £ s' 3 d^. 

<,1=1 

Apart from a factor N/(N — 1), this is the maximum likelihood estimate. 

In the discriminant function used by Fisher (1), the object is to find a linear 
function I wa = XM t x wa , where the are chosen to maximize the ratio of the 
sum of squares between samples to that within samples in the analysis of variance 
of I. This is equivalent to maximizing the ratio of the difference between the 
two sample means of I to the estimated standard error of this difference. As 
Fisher showed (2), the Mi (apart from an arbitrary multiplier) are given by 

M, « £ s' 1 d, 

j’=% 

Consequently, the difference between the two sample means of I, the discriminant 
function, is 

E Mi <k = E s ll d l d i . 

<=i *.i=i 

This is exactly the same as pD 2 m equation (12). Thus the discriminant func¬ 
tion leads to the estimated distance, and viee versa. 


11. Extension to the present problem. In our case there are (p + k) variates 
(p discriminators, k covariance variates) from which to estimate the distance. 
All variates, Xi wa and x ilva , are assumed to follow a multivariate normal distribu¬ 
tion. The covariance matrix, assumed the same m both populations, now has 
(p + k) rows and columns, and may be denoted by 


(13) 




For each of the covariance variates, it is known that the population means 
mx , mx are equal, so that the difference <5 S is zero. This is the fact that dis¬ 
tinguishes the problem from ordinary discriminant function analysis. 

Hence, the generalized distance, as defined from all (p -f k ) variates contains 
no contribution from terms in <5$ and is given by 


(M) 


(p + /c)A ! 


-£ 


i.i“i 


v\v+h) fy • 


The matrix o-Jp+y is that formed by the first p rows and columns of the inverse 
of A. Note that in general this will not be the same as the matrix a li , which is 
the inverse of cr„ . 

In the next section we consider the estimation of this quantity from the sample 
data. By analogy with the previous section, it might be guessed that the 
estimate would be of the form Ss’ ( (,+*,) d t d,. The maximum likelihood estimate 
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turns out to be of this form, except that instead of d, we have cl', , tlie difference 
between the two sample means of the deviations of x f from its ‘within-sample' 
linear regression on the X(. 


12 . Estimation of the distance. It is known that the generalized distance 
is invariant under non-singular linear transformations of the variates. For 
convenience, we replace the Xi ua by variates x[ wa , where 

k 

I'iuja — X,um 'j 5i;(-Y*f[ia ' g £ 10) * 

£-1 

Thus x', aa is the deviation of Xi wa from its population linear regression on the 
Xc wa . The population mean of z[ ua ia clearly Hw , and the difference between 
the two population means is therefore 5,. 

The covariance matrix of the x, wa , x^ a may be written 


(15) 




where cr,j j denotes the covariance matrix of the deviations of the X( Wa from their 
regressions on the %,, a . It follows that in terms of the transformed variates 
the generalized distance is given by 


(16) (p + Da’^As, 

t.j-i 

where a' 1 f is the inverse of the p X p matrix . 

The joint distribution of the 2N observations on each of the x (wa and s tuJO 
is as follows: 


(2t)' 


-N(p+k) I .13 £ I +N | Jn I +W 


exp <- 


1 


(r | nd.ruya d.r^i 

2 If p 


ZA O" Oc,ui a Hud) (.X jwa Hiui) “f“ 

<X«1. 1 , Jcxl, 


2 -Y h -i\ 

2 £ 2 & (%(ica Hlw){X nwn — g,ui) | , 

tn=l a=l J 

where is the inverse of the k X k matrix cr ( ,, 

We now proceed to estimate A 2 in equation (10) by maximum likelihood. 
For this, we obviously need the sample estimates of the tr 1>t and the 5,, and it 
will appear presently that the sample estimates of the /?i£ are also required. 
However, it happens that the cr {l and the ntu< arc not needed. Hence the rele¬ 
vant part of the likelihood function is 


(17) L = N log | 


1 P 

2 S <r ' J (( ' Xivia ~ - n/j 
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where 

k 

%iwa = %itou ' 0i((%[wa M(w) • 

f-1 

Differentiating first with respect to ft m , we obtain 

(18) E E \x\ vu - dj = 0. 

a—1 

Except in the case (with probability zero) where our estimate of o-‘ J £ turns out 
to be singular, these equations have no solution except 

(19) E (4« “ M = 0 

Q = 1 


for every j, w. Consequently 


so that 


Hjw 




Mjl — ^’,2 -Tj!. — dj d (, 

{-1 

-This'shows that the j3< ? must also be estimated. Now 

9L _ 4^ dL dx[ wa _ v V V »?•{/ \/ / , 

jo' a-/ 2-i n > jo 2—! 2—/2-J& ^^w)[x Jwa Pjw)- 

Crptl- Waal «>*=1 0%iioa W“>1 a=l ;*=! 


Once again, unless the estimate of <7 t, ' £ is singular, the only solutions of the equa¬ 
tions formed by equating this quantity to zero are 

2 iV 

( 20 ) Cni^itrac /^^)(*r j'uja gjiu) ~ 0 

t« = l anal 


for every £, j. 

Since dim — x }UI , the term in^ {w vanishes. Substituting for x' in terms of x 
from (17), we obtain 


2 N 

E E 

U)t=l aanl 


* 

> 

JJool 


($jwa $r,w ) r — 0 


where b n stands for the maximum likelihood estimate of /3,-,. These equations 
may be written 

k 

(21) ■ E — 7?,'f 

where E denotes a sum of squares or products of deviations from the sample 
means, containing 2 (N — 1) degrees of freedom The equations are therefore 
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the ordinary normal equations for the ‘within-sample’ multiple regression of 

Xjivat On the Xiiua • 

Finally, differentiation of L with respect to the <r’ J E leads to 

1 N 

(22) 2Nffi, ( = 22 22 (%iwa X,uj.) (x,’ ica Zj w .') ■ 

10>=1 0(0=1 

This is just the ‘within-samples’ sum of squares or products of the variates x\ 
On substituting for the x‘ in terms of the x and using equations (21), we obtain 

k 

2Nfrij.( = E,j — 22 ht Eft = -Efw (say), 

i-i 

To summarize, the estimated distance is given by moans of the equation 
(p + h)D 2 = £ P’-%8, = 2N £ E ty( di d}, 

i,j-l 

where U’ H is the inverse of E ir( and 

d[ = di - 2] hi d ( . 
t-i 

This estimate was obtained by assuming nli variates jointly normally dis¬ 
tributed. From the form of the likelihood function (17) it can be seen that the 
M.L. estimate of the distance remains the same under the less restrictive assump¬ 
tions that the X; wa are fixed, while the deviations of the x,w from their regressions 
on the are jointly normal. 

13. Computational procedure. Au orderly procedure for calculating the 
generalized distance will now be given. From this, the method for computing 
the corresponding discriminant function will be shown. The computations also 
lead to the generalization of Hotelling’s T 2 . The steps are as follows. 

(i) . First form the within-sample’ sums of squares and products of all variates, 

with 2(N — 1) degrees of freedom. These are the quantities denoted by 

, 2?,j, £?{,. 

(ii) . Invert the matrix JS { ,, giving E (v . 

(in), The regression coefficients b l( , estimates of the , are now obtainable by 
means of the relations 

hi = 22 e { ,e*\ 

as is clear from the usual matrix solution of equations (21), 

(iv). The sums of squares and products of the deviations of the from their 
withm-sample’ regressions on the x £ are now computed from equations (22) 

2N hvi = E xH = E tj - 22 b ;( E i( 

f-i 

l 
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(v). The final step is to invert the matrix Eu E , giving E' 1 £ , and to form the 
product 

(p + 7c )if = 2N Yj E" ( d[ d \, where d[ = d t — 2 &if • 


{-i 


When there were no covariance variates, the discriminant function I had the 
property that the difference between the two sample means of I was equal to the 
estimated distance (Section 10). This relationship can be preserved when co- 
variance variates are present by defining I so that 

Eva ~ ^ ^ Ttfi ( Xiwu ’ ^ I, 

1=1 \ S-l / 

and calculating the weights M, from the equations, 

£ E l{ = dl 

For in that case, 


i-i 


M x = £ E' yi d'f. 

Jal 


Consequently the difference between the two sample means of I is 

£ M x d.[ = £ E'^d[d',, 

>=i 1.1=1 

which (apart from the constant 2N) is equal to (p + ft)iD*. 


14. Distribution of the estimated distance. In the ordinary case, with no 
covariance variates, the frequency distribution of the estimated distance has 
been given by several authors, e.g. Hsu [G]. It will be found that in our prob¬ 
lem the distribution is essentially the same, except that the quantity D must be 
multiplied by a new factor and that one set of degrees of freedom entering into 
the result must be changed from (n — p + 1) to (n — p — k + 1). 

Thus far we have assumed that all variates jointly follow a multivariate nor¬ 
mal distribution. It is convenient at this stage to regard the covariance variates 
Xfrc, as fixed from sample to sample, and to use the conditional distribution of 
the a Una j subject to this restriction. It is well known (e.g. Cramdr [7, section 
24.6]) that this conditional distribution is the multivariate normal 


(23) 


• exp 


(2rr)~ jVp | | + " II 

r 




2 N p 

^ v ^ 7 y 1 7 (#iu)a ’ “ 'Yuan) (jCjioa 


L, Uluttl Qoul 



■where 

it 

yiwff = ~ M£w))‘ 

f»l 
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Since the estimated distance is a function of the quantities Ei, f , d'i, we now 
find the joint distribution of these variates. The joint distribution of the 
sums of squares and products f is obtained by quoting a slight extension of a 
result due to Bartlett [ 8 ], which may be stated as follows. 

Let the variates z ma follow the distribution (23) and let 

2 N 

(i) ^ ^ i (.Hiujer ni| W ) (rj-i ;n -) 

a=.l 

be a typical ' within-samples ’ sum of squares or products , 

(ii) b, ( = £ E„E (v 

* 7™1 

be the 1 within-samples 1 partial regression coefficient of x t on , and 

k 

(W) £„.« = E.j - D b, { S j£ 

be the sum of squares or products of deviations from these regressions. Then 

(a) the quantities Ei,.^ follow the Wishart distribution 

c \exp j—£ t 

with (n - k ) d.f., where n - 2(N - 1), 

(b) this distribution is independent of that of the bi(, and 

(c) both distributions are independent of that of the means xi„ and consequently 
of that of the difference d ( = (*« — xa )• 

The result was proved by Bartlett for a sample from a single population. The 
extension to the case of two populations is straightforward and will not bo given 
in detail. 

From (b) and (c) it follows that the distribution of the £,7 j is independent 
of that of the quantities 

d[ = d, -£ b, ( d ( . 

S-l 

Further, with the sj variates fixed, the d[ are linear functions of the X{ w « with con¬ 
stant coefficients and hence follow a multivariate normal distribution, Wilks 
[9]. We now find the means and the covariance matrix of this joint distribution. 
From the joint distribution (23) of the 01 , 7 , a , it is easily seen that 

^ E{d t ) = A +- £ ftf d t . 

Also, since by standard regression theory the h £ are unbiased estimates of the 

, 

E jjC b,f d £ j> = X) /3,f df. 



Hence, by subtraction, 

(25) 

Now 
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E(d[) = Si . 


Cov (dt d' } ) = Cov (d, - 2 &, { d{)(dj - 2 5/, d„). 

{-I ,=i 

By (c) the distributions of the di , b n are independent, so that there will be no 
contribution from products of the form d.b,, . Hence 

(26) Cov (d[ d\) = Cov (d, d,) + 2 d f d, Cov (b,fb„). 

t.i-i 

Since d, is the difference between the means of two samples of size N, Cov 
(d.d,) is 2 o-,, {/AT. The covariance of b i£ and b„ is more troublesome. Writing 
the expressions for these regression coefficients in terms of the original data, we 
have 

Cov (b lt b„) = 2 Cov (E aJ ff |r ) = 

X»|/W=l 

X 2 N 

53 53 ^Xw ) (fl'vxf ) COV ( 2 J l1ija , 

X.vxal U1.2—X a.f—1 

Since successive observations are assumed independent, the covariance term van¬ 
ishes unless w = z and a = f, in which case it equals a,, f . Thus 

Cov (b lf b„) == 2 

X,p«1 


Finally, from (26) 

(27) Cov (dj d() = <r„ f (1 + ^2 ^ d,) = { (say). 

Having obtained the distributions of the H„. f , d(, we may apply Hsu’s result 
[ 6 ] for the general distribution of Hotelling’s T 2 . In our notation, this may be 
stated as follows. 

If die variates d'j's/v follow the multivariate normal distribution with means 
Si/\/v and covariance matrix vm, and if the variates E^ follow the Wishart 
distribution with (n — k ) d.f. and covariance matrix o-„. £ , the two distributions 
being independent, then 

y=i W* d[ d'/v, 


follows the distribution 


(28) e_T S hi B[\p + h, *(» - k - p + 1 )} 


y ip+h -\l + j/)-K^+»-* d y t 
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where 


t = i i i 

’.3-1 

<J = J + 2 E ( " d t d,, n ~ 2(AT - 1). 

This distribution is, of course, the distribution of the ratio of two independent 
values of x\ with p and (n - k - p + 1 ) d.f. respectively, in the case where the 
numerator is non-central. 


16. Tests of significance. This result leads to the extension of Hotelling’s 
T 2 test. For if 5. = 0 , (i = 1 ,2, • • • p), then r is zero and 


i e' m d[ d\ 


1.3-1 


is distributed as vpF/(n — k — p + 1 ), with p and (n — k — p + 1 ) d.f. The 
distribution (28) above gives the power function of this test. 

We may also wish to apply a test of this type to a subgroup of the dis¬ 
criminators (i = 1,2 , • •• q < p). Speaking popularly, this is a test of the null 
hypothesis that the above variates x< contribute nothing to the discrimination 
between the two populations, given that the remaining discriminators and the 
covariance variates have already been included . 1 To see what is meant more 
precisely, consider the following transformation: 

x > ~ ~ 2 fiuXi — 2 i =s 1, 2, * • • q; 

Xj = a;j — 2 0 (£ x { , £ == g + 1, • • • p; 

*«"*»• $- 1 , 2 ,...*, 

where the 0*s are population regression coefficients. Then it is not difficult to 
see that the distance is now given by 


where tr”' !f 


(p + fc)A 2 = £ + t, <r ,a %8„ 


is the Averse of the covariance matrix of the deviations of the an 
trom their regressions on the x, plus the x t , and 

Si = S-i — 2 /3i ( 5(. 

^ S< ^ = 1 ’ 2 ’. " ‘ the distance is exactly the same as it 

t ° mitted ' The lest in Ration is therefore a test 

ot the null hypothesis that S< = 0, (i = 1 , 2 , • •. q). 

^,° th discriminators x, and the covariance variates x ( are re- 

garded as fixed, the method of proof in the previous section provides ar tist 


1 The test is illustrated in Bection 7. 
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for this hypothesis also. It is found that the sums of squares or products 
follow a Wishart distribution with (n — h — p q) d.f., while the quantities 

p k 

d[ = d t — E b,i di — E dt- 

1-3+1 f~l 

are normally distributed, with zero means when the null hypothesis is true. This 
leads to the result that 

E E' 1 !£ d[ d\ 

», 3=1 

is distributed as v'qF/(n — lc — p -f- 1 ), with q and (n — 7c — p + 1) d.f., and 

v' 

the sum extending over both the covariance variates and the discriminators that 
are not being tested. 

16. Discussion of the gain due to covariance. In this section we attempt to 
construct a measure of the amount that has been gained by the use of the co¬ 
variance variates. Only a preliminary discussion will be given: a complete dis¬ 
cussion would be lather lengthy, owing to the many different uses to which the 
discriminant function can be put. Perhaps the problem can most easily be seen 
by considering the effect on Hotelling’s generalized T s test of significance. 

The power function of this test, as obtained from equation (28) section 14, de¬ 
pends on four factors; the lovel of significance that is chosen, the degrees of free¬ 
dom n\ and n 2 in the numerator and denominator of F, and the parameter t. 
If the covariance variates were ignored, the usual T 2 test could be applied to tho 
discriminators alone. In this case we would have 

n'i ~ p, n't = n — p + 1, F — !2cr i; 5,<5,/i/, where v' = 2/N. 

With the covariance variates, we have 

Hi - p, n 2 - n — p ~ k + 1, r = i2<r i} ' ( 5,S,/v, 


where 


The first point to nofc is that 

2 > 2 <r%i, 

This is an instance of the general result that the addition of new variates cannot 
decrease the value of pA l , To see this, leplace the covariance variates by their 
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deviations from their regressions on the discriminators. This transformation 
gives 


(29) 


23 = 22 + 22 o-^"^si, 

. i i i e t 


where 


6 $ = — 22 
ttfl 

Since the term on the right of equation (29) is a positive definite quadratic form, 
the result follows. 

Consequently, the first effect of the covariance variates is to make the numer¬ 
ator of r greater than that of F. As a partial compensation, the denominator 
v is also greater than v', but it may be shown that the difference in the denomi¬ 
nators will usually be trivial if k is small relative to n. We therefore expect r 
to be greater than F. Now for fixed n x , n% and significance level, it is well 
known that the power function (28) is monotone increasing with r. Hence, 
other things being equal, the increase in r due to the covariance variates leads to 
a more powerful test. 

The two power functions, however, differ in another respect, in that with co- 
variance the value of m is reduced from (n — -p + 1) to (n — p — k ■+■ 1). This 
decrease in the number of degrees of freedom in the denominator of F will to 
some extent offset the gain from an increased r. Examination of Tang's tables 
[10] indicates, however, that if the degrees of freedom are substantial, this effect 
will not be important. Moreover, in most practical applications, k is likely to 
be only 1 or 2. Hence, as a first approximation the effect will bo ignored, though 
to do so tends to overestimate the advantage of covariance. 

Suppose now that r = rF, where r > 1. Since r # is proportional to JV, the 
size of sample taken from each population, we could make r' ~ t by increasing 
the size of sample (when covariance is not used) from N to rIV, This suggests 
that the ratio r/ F can be used, as a first approximation, to measure the relative 
accuracy obtained with and without the use of covariance. This measure carries 
approximately the usual interpretation that the inferior method would become 
as good as the superior method if the sample size for the inferior method were 
incieased by the factor r. A further refinement could be made to take account 
of the difference in the n* values. By trial and error applied to Tang's tables, 
one could determine r' so that the two power functions would be as nearly coin¬ 
cident as possible. 

In practice, the ratio r/ F must be estimated from the data. From the powor 
function in equation (28) it is found by integration that the mean value of y is 

(2r + p)/(m - 2), 

so that an unbiased estimate of t is 
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This suggests that the quantity 

. ( "2 ~ 2) F _ 1 

should be calculated both with and without covariance. The ratio of the twb 
values will probably not be an unbiased estimate of t/Y, but may be used pend¬ 
ing further information about its sampling distribution, This type of calcula¬ 
tion is made for the numerical example in section 8. 

17. The case of a row by column classification. Thus far the discussion has 
been confined to the case where there are only two populations. The technique 
may also be used when there are more than two populations. The difference 
hi between the two population means is replaced by some linear function of the 
population means. As an illustration we consider a row by column classification, 
the case that arises m the numerical example. No detailed proofs will be given, 
though it is hoped that the theory can be fairly easily developed from the mathe¬ 
matical model. , ■ 

A typical variate is % lwI , where i = 1,2, • ■ p denotes the variate, w = 1, 
2, • • • r denotes the row and z — 1, 2, • ■ • c denotes the column, there being 
one observation in each cell 'The variates x twz follow a multivariate normal dis¬ 
tribution, with covariance matrix <r,/ j and means 

k 

E(x iwl ) = in. + P»u "h Tia + Z "" *£• )i 

£-1 

where p vlll denotes the effect of the row and that, of the column. Without loss 
of generality we may assume that 

^ j Piw ^ . y ii fi. 

UJ Z 

In addition, there exists a known set of variates t, such that 

7.2 = St U, Z U = 0 

2 

That is, the column constants have a linear regression on a set of known numbers. 
The following are the maximum likelihood estimates of the relevant constants. 

b* = Z 

*J«nl 



where 
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In the notation used for numerical calculation, 

( 4 ~ ^ &,f di) - - d '- where d< = £ t. X t .,, 


5, = 


, 2 ' 


r£i“ r£i* 

the quantity Z<. being the column told Finally 

rct, i( = E l ,t = E li -Zb. ( E (1 . 


The distributional properties are similar to those in the two-population case. 
The quantities !?,•,.{ follow a Wishart distribution ith (re — r -* 1 ~ k) d.f. 
and covariance matrix <r,/ f . The variates d[ follow a multivariate normal dis¬ 
tribution with means Xt] and covariance 

ov/ ((r2tl 2 F { ’ d ( d,) — (say). 

Consequently, 

y = ZE' h d' t d't 

is distributed as vpF/(rc - r — p - k) with p and (rc — r — p — Ic) d.f. and 
parameter 

t = W2tlW H 8 ( 8,/v. 

Thus in the numerical example, with r = 12, c - 4, p = 2, k = 1, this procedure 
would have given an F test of the null hypothesis r = 0, where F has 2 and 33 
d.f. However, the contribution from 2 degrees of freedom was deliberately 
omitted from the quantities E {j , so that F actually had 2 and 31 d.f. 


PART III 

18. Justification of the ‘dummy variate’ approach. It remains to show that 
the method of calculation used in the example (sections 5 and C) is equivalent to 
that derived from theory. There are two chief points to prove. First, that the 
M’a found from the equations 

(30) £ E, h M) = d( 

\ l 3 

are proportional to the corresponding Ids found from the equations 
(3D £ S tl L/ = d, 

a 

where the suffix a denotes summation over both x { and x ( variates. 

Now, since S t , = E tj + d< d,/2AQ, equations (31) are the same as 

£ EijL> = d,( 1 — £ Ljdj/ 240). 

a ft 

Hecee the V s in (31) are proportional to the values found from the equations 
®» XZ-tK-d,. 
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But it is well known that if the L{ are eliminated one by one from equations (33), 
we obtain 

H Ei^L] - dl, 

i 


which is the same as (30). This proves the first point. 

The second point to establish is that the F test in the example is the same as 
that obtained from theory. In section 15, it was shown that 

(34) £ E" £ d'i d'i/v 

t.J 


is distributed as pF/(n — p — k + 1 ). In the analysis of variance of Table 6 , 
section 6 , the quantity folio ring the same distribution was 


(35) 

whore 


(S a ~ S t ) 
(240 - Sa) ’ 


S a = £ S ,J did,, S( = £ S e ” d ( d „ 

a l,v 

Since equations (31) and (32) have the same solution, wo must have 


S' 1 = E <3 (l - £ L, dj 240) = E ,v (l - Sal 240). 

a 


Multiplying both sides by d; d, and summing over all i, j, we obtain 
S a = E a ( 1 - &/240) = E.( 1 + E a / 240), 
where E a is defined analogously to S a ■ Similarly 

S ( = EJ(l + EJ 240). 

Hence 

/„p\ Sa — S( _ E a — Ei _ E a — E^ 

1 ' 240 - Sa 240 + E] ~~ v ' 

* 

Transform the variates x ,, xt into variates x[, , where x[ = x t — 2b^x ( . 

It is easy to see that this transforms 

£ E {J di dj into £ E*' di d n + £ E' H d\ d \. 

a i,] 

That is, 

E a = Ei + £ E i] f d'i d'j , 


since the quantity on the left is invariant under non-singular linear transforma¬ 
tions, Hence from (3G), 


(So - SQ 
(240 - So) 


£ E' h d( d’jv. 

<*3 
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From (34) and (35), this establishes the equivalence of the F teats. While the 
proof has been given only for the type of data encountered in the example, the 
same method will apply to other types of data. 

In conclusion, we wish to thank the referees for many helpful suggestions in 
connection with the presentation of this paper. 
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ON THE KOLMOGOROV-SMIRNOV LIMIT THEOREMS FOR 
EMPIRICAL DISTRIBUTIONS 

By W. Feller 
/ Cornell University* 

Summary. Unified and simplified derivations are given for the limiting forms 
of the difference (1) between the empirical distribution of a large sample and the 
corresponding theoretical distribution and (2) between the distributions of two 
large samples. 

1. Introduction. Let X\ , • • ■ , X K be mutually independent random vari¬ 
ables with the common cumulative distribution function F(x). Let X* , ■ • • , 
X N be the same set of variables rearranged m increasing order of magnitude. 
The empirical distribution (or sum-polygon) of the sample Xi, • ■ , X N is the step 
function S N (x) defined by 

0 for x < X* 

(1.1) SAx) = ~ for Xt < x < Xt + 1 

1 for x > Xt- 

In other words, N-S n (x ) equals the number of variables X r which do not exceed 
x. We expect intuitively that S N (x) —> F{x) as N —> In fact, if this were 
not so the notions of distribution and sample would be meaningless. The so- 
called (/-criterion of von Mises [4] provides rough estimates for the probable 
deviations of S«{x) from F(x) for certain forms of F{x) (cf. von Mises [4]). A 
much stronger result is due to A. Kolmogorov and is of great interest in the 
theory of non-parametric estimation (Kolmogorov [3]). The maximum of the 
deviation j S N (n) — F(x) | is a random variable D N whose distribution is easily 
seen to be independent of the special form of F(x) provided only that F(x) is 
continuous. 1 The exact distribution of D N is not known, but Kolmogorov found 
that N^n has a limiting distribution. More precisely we have 

Theorem 1 (Kolmogorov [1]). Suppose that F{x) is continuous and define 
the random variable D t , by 

(1.2) Dn c l.u.b.| S N (x) ~ F(x) |. 

♦Research under an ONR contract. ' 

1 This fact will not be used explicitly m the sequel but follows as a byproduct from our 
proofs. A simple direct proof consists in considering tho random variables 2* = F(X*) 
which are uniformly distributed, the maximum deviation Z)jv of the empirical distribution 
of the new sample |2t) from the uniform distribution has the same distribution as Dw; 
cf. Kolmogorov [1]. 
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Then for every fixed z > 0 as N —> 

(1-3) Pr [D„ < 2 AT*] 




where L (z) is ihe cumulative distribution function which for z > 0 is given by either 
of the following equivalent relations 2 

w US 

(1.4) L(p) = 1 - 2 £ (-1 r 1 . ( 2 ,)‘ a •' £ 




i 1 **! 


For z <0 we have, of course, L(z) ~ U. 

Equally interesting is Smirnov’s result concerning the maximum difference 
between the empirical distributions of two samples with the same cumulative 
distribution. 

Theorem 2 (Smirnov [5]). Let (Xi, , A r ,„) and (ih, ■ • • , F„) be two sam¬ 

ples of mutually independent random variables having a common continuous dis¬ 
tribution F(x) Lei S m (x) and T n (x) be the corresponding empirical distribution 
junctions ana define a new random variable D m ,n by 

( 1,6) = hu.b.l ri n (x) - Tube) |. 

Put 


( 1 . 6 ) 


JV- 7 
in + n 


and suppose that tn —> c© 7 u —> ca $q ifodi 
(1.7) 


m 


a , 


where a is a constant. Then for every fixed z > 0 

(L8) Pr \J>m M < *N~'\ -* L{z), 

where L(z ) is the same as m (1.4), 

|(i)) ,“'“ ™ y . i " lr, “ to “ d 

Milmy tl«*,„ „f e5u ., depBl prowd h a 

Ail alternative poof of Kotaogomv'a theorem i» du0 J kSfmov iT n ZvI ' 

M*™S^£nllX”ofmt COrl>ll r iOSt f ( 1 h“ h dKm< 0,111 1081 u * etul ) 

surprising that Smirnov’s proofs require a mwJniiH- 1 i therofore > not 
considerations. It is the miroosp of P . ° V toclmic l l,e aild m “iy auxiliary 
of the two theorems which are baser] nn pi ’ C , SOn . t pilpcr tu P™sent unified proofs 
--ZT ,Ch are based on metb «cls of great generality , 3 The new 

2 The equivalence of the two formulas in n .n u , 
formation formula for theta-fpuctions Wo,ill w f 1 " knmv, ‘ rc!,atl °n often called trarin- 
(1-4). The second is more useful for small z A tabLTm?- ^ r0pr0flentati °n >n 
It. is mprintod m the mesent issue of the An,,' r efr \ ls glven iu Smirnov (0). 

s Amon s oth,, results which can be proved by (pp ‘ 27 °- 3B1 )- 

rems for rum and first-passage time P 10 '™7in tbe'tho^ry 
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proof is not simple but simpler than the original ones. At any rate, it requires 
essentially only routine manipulations with generating functions and their 
limiting form, the Laplace transforms. However, the paper aims mostly at a 
unification of methods. 

As a byproduct of the proof we obtain 

Theorem 3. Let A N be the number of points x where the step-polygon Sn (x) 
of Theorem 1 leaves the strip F(x) ± zN~ s . The expected value of the random vari¬ 
able A u satisfies the asymptotic relation 

(1.9) E{A„) ~ 2('2rN) i j 1 - $( 22 )}, 

where $(2) is the normalized Gaussian distribution. 

An analogous corollary to Theorem 2 was given by Smirnov [ 8 ]: formula (1.9) 
holds also for the number of intersections of the graph of S m (x) with the step- 
polygons T n (x) ± zN~K These results should come as a surprise to most statis¬ 
ticians. According to Theorem 1 there is a positive probability that = 0 
and nevertheless E(Ar/) is of the order of magnitude N\ The explanation 
lies in the fact that if S N (x) crosses the curve E(x) + zN~ i at some point then it 
is extremely likely that S(x ) will in some neighborhood continue to fluctuate 
around values F(of) + zN~\ crossing that curve a great many times. The differ¬ 
ence Sn(x) — F(x) exhibits, in the limit N —* », many small oscillations. This 
phenomenon is related to the well-known fact that the path of a particle subject 
to the Einstein-Wiener diffusion process has no derivatives. 

Instead of the absolute values of the differences we may consider the differ¬ 
ences themselves and derive two parallel theorems for the maximum and the 
minimum. As an example we shall prove 

Theorem 4. With the notations and assumptions of'Theorem 1 let 


( 1 - 10 ) 

Z>£ = l.u.b.{£»(*) - F(x)}. 

Then 


( 1 . 11 ) 

Pr{£^ < zN' 11 } -» 1 - e~ 2 ‘ 2 


The proof is simpler than that of Theorem 1 but uses the same method. 

2 . Notations and preliminary remarks. For printing convenience it is desir¬ 
able to avoid complicated subscripts and we shall therefore use the following 
notation for binomial coefficients 

( 2 . 1 ) C{n,k)- (”). 

Similarly, for the general term of the binomial distribution we shall write 

(2.2) B{n, k ; p) = C(n, fc)p*( 1 - p)”~ l . 
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If A is an event, A will denote its negation (complementary event). Finally 

(2.3) Pr [A\B] 

denotes the conditional probability of A for given B. 

Our proofs depend on a special case of the continuity theorem for charac¬ 
teristic functions. Since we shall deal only with probability density functions 
f(l) which vanish for t < 0 it is preferable to use, instead of the characteristic 
function, the Laplace transform 

(2.4) 4>(s) = [ ) dl. 

Jo 

(This amounts to using the variable — s instead of the usual is and therefore 
<t>(s) obeys the formal rules for characteristic functions.) 

For any sequence {ui.}(k = 1, 2, • • •) of non-negative numbers we define the 
generating function u(\) by 

oo 

(2.5) u(A) = S Mfc X*. 

k=l 

Now let 5 > 0 be fixed and consider the step-function fi{L) defined by 

(2.6) /«(£) = Uk for (k — 1)5 < t < kS 

(k = 1, 2, • • • ; fs(i) = 0 for t < 0). Its Laplace transform is 

(2.7) *(«) = «(<f J *). 

s 

We have, therefore, the continuity-theorem: If, as 5 —» 0, 

(2.8) &<(«"*•) *(s), 

then for every fixed t > 0 

(2-9) Uk —>/(0 when IcS —> (; 

conversely, if (2.9) holds then (2.8) is true. 

3. Proof of Theorem 1. Since F(x) is continuous it is possible to define num¬ 
bers Xk such that 

(k - 1,2, ... ,2V - 1), 

This definition is unique except when F(x) = k/N within an entire interval, in 
which case we define x h as the left endpoint of that interval. 

Let c > 0 be an integer. We shall evaluate the probability of the event 
D n > c/N and we shall later put 

(3-2) c = zN\ N —► oo. 
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Suppose first that for some particular x 
(3.3) SM - Fix) > t. 

This point x is contained in a maximal interval in which (3.3) holds and at the 
right endpoint £ of this interval we shall have 

(3-4) &({) - F(£) = £ . 

Now Snii) is necessarily a number of the form r/N with an integer r. Since c 
is an integer also F(£) - k/N and hence £ = x k for some k. From (3.4) we 
conclude that 

(3.5) X k+e < Xk, Xk+c+i > Xk 

or in other words: exactly 7c + c among the N variables X, are smaller than xl . 
Denote this event by Ak(c). The inequality (3.3) takes place for some x if, and 
only if, at least one among the events A Ac), ■ • • , A N {c) occurs. The argument 
applies equally to c < 0 and shows that the event D N > c/N occurs if, and only if, 
at least one among the events 

(3.6) A l (c), Ai(-c), A,(c), Ati-c), ••• , A N {c), A„(-c) 
occurs. 

Let V, and V r be the events that in the sequence (3.6) the first event to occur 
are A r (c) or A r (-c), respectively. More formally, the events U r and 7 P are 
defined by 

U, = A,(c)li(-c) • • • A r -i{c)A r -i{~c)A r {c) 

(3.7) _ _ 

7, = Ai(c)Ai(-c) ■ • • A r _i(c)A r _i(—c)A,(c)A r (-c). 

These events are mutually exclusive and therefore 

(3.8) Pr | D k > £ j = g Pr {77 r } + E Pr (7 r ). 

From the very definitions we have the following two fundamental relations 

Pr {AMI - E Pr {U t } Pr {A k ic) | A,(c)} 

r—1 

+ E Pr {Pri Pr {Akic) I A r (—c)j 

r-1 

Pr {At(—c)) = E Pr {CM Pr (A fc (-c) | AM) 

+ SPr {7 p ) Pr {A 4 (-c) I Ari-c)}. 

ral 


(3.9) 
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This is a system of 2 N linear equations for the 2 N unknowns Pr ([7 r | and 
Pr \V T } and we proceed to solve it by the method of generating functions. 

By definition of a:* we have Pr (X, < = h/N. The probability of the event 

A k (c) (that the same inequality holds for exactly k 4- c different v's) is therefore 
given by 

(3.10) PrfM;)) = B(N,k+<r,k/N) 

(cf. (2.2)). Similarly, it is readily verified that for r < Ic 

(3.11) Pr \A h (c) | A r (c)} = B{N - r - c, le - r-(k - r)/(N - r)). 
and 

(3.12) Pr {A k (c) | A,(— c)) = B(N -r+c,k — r + 2c; (k - r)/(N - r)). 

The last three equations hold also for c < 0. They can be wiitten in a more con¬ 
venient form in terms of the quantities 


(3.13) 

)k+c 

= f ^ + - ?1 . 

In fact 


(3.14) 


(3.15) 

Pr (Ar(c) 1 A r (c) ] = Vh-Mv^-c) 

PN-r\ C) 

(3.16) 

Pr (A*(c) | Ar(-c)) = P -h. r(2c)pAr^(-c) 

P<V-r(C) 


If these expressions are introduced into (3.9) the second factor in the numerator 
cancels. A further simplification is achieved on introducing new sets of un¬ 
knowns 


(3.17) u r = Pr {U r \ SA. Vr = Pr \V r \ . 

Pv-r(-c) p^-r(c) 

The fundamental equations (3.6) then reduce to 

* ^ h 

Vk{c) = Z u, Vk-M + Z v r p k -J?c) 

r-l r -1 

(3.18) 

* h 

Pk{~o) — Z UrPk- r( — 2 c) + Z v rPk~r{ 0 ) . 

This system is of the convolution type and can therefore be solved by moans 
of generating functions. The essential point is that the p h (c) aro defined for 
all k and that the system (3.18) therefore determines the unknowns u, and t>, 
for all r > 0. We put 

uQO = Z U k l k «(X) = Z«*X* 


( 3 . 19 ) 
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and 

(3.20) ' p(X;c) =r i Ep 1 ( c )l > , 

fc™ 1 

(The factor N~ i serves to simplify formulas.) Then obviously 
p(X; c) = u(X)p(X; 0) + v(X)p(X; 2c); 

(3.21) 

p(\) -c ) = u(X)p(X; -2c) + y(X)p(X;0). 

From here we find «(X) and y(X). Equation (3.17) then determines Pr { XJ,\ and 
Pr \V r }. Actually we are interested only in the two sums occurring in (3.8). 
We put 

(3 - 22) £ * ■ m § ■ * - so) S • 

Again, the & and iare defined for all ft (also ft > A). From (3.17) we have 
(3.23) E Pr { U T } = , E Pr {W} = , 

r-l f-1 


and hence finally, by (3.8) 

(3.24) Pr {D„ > c/N} = ^ + r , N . 


In (3.22) we find again simple convolutions leading to products of the corre¬ 
sponding generating functions. Thus 


(3.25) 


tw-if 

Pa'(O) 

*=1 Pfl ( 0 ) 


We now pass to a study of the limiting form of these generating functions 
as N —» «> and c —> «> in accordance with (3.2). Consider a fixed t > 0 and sup¬ 
pose that 


(3.20) 



From well-known properties of the Poisson distribution it follows then that 
(3.27) WV*(c) —> (2jr«) -J exp(— z 2 /2l). 

Accordingly, the continuity theorem of section 2 implies (as can be verified dir¬ 
ectly) that 

; zN l ) —> (2tt ) -4 f T* exp (— ts — s*/2t) dt 
Jo 

= (2s) _i exp(— (2SZ 2 ) 1 ). 


(3.28) 
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(the last integral is well known and can be evaluated by elementary methods; 
the square-root is always positive). We see in particular that the limiting form 
is the same for p(X; c) and p(\; - e). It follows therefore from (3.21) directly 
that 


(3.29) 


lim u(e I,y ) = lim «(c' _ " /v ) 


cxp(-( 2 s: l ) ! ) 


1 + exp (-( 8 S 3 2 J*) ‘ 


Using this and the fact that p*(0) -> (2 vN)" 1 we conclude from (3.25) that 
lim IV 1 *(<?-' w ) = lim 

N-*t >o N—*ta 

(3.30) __ /2»y* exp (- ( 8 _S 3 5 n , 

' \ 2 s/ r+ exp (- ( 8 sl ! ) r ' 5 ) W1 


Expanding <t>(s) into a geometric series we get 

(3.31) m = (I)" g (-ir‘ exp (- ( 8 s, 1 2 s ) 1 ' 1 ). 

From the evaluation of the integral in (3.28) we conclude that <t>(s) is the Laplace 
transform of 


(3.32) /(f) = £ (- 1)’ -1 exp (- 2 ,V/ 0 . 

»-i 

The continuity theorem of section 2 in conjunction with (3.30) and (3.20) shows 
that 

(3.33) lim = lim =/(l). 

N~*b) to 

In view of (3.24) this accomplishes the proof. 


4. Proof of Theorem 4. This proof is simpler than the preceding one inas¬ 
much as we are now interested only in the events A\ t(c) for c > 0. This time we 
define U T as the event that k is the smallest subscript for which Ak(c) occurs, that 
is, V r = i,(oJi,(c) • • • j4 r -i(c)A r (c); no analogue to the event V, will be used, 
With the same notations as before (3 9) is replaced by 

(4.1) Pr U*(c)} = £ Pr ( U r ) Pr [A k (c) | A r {c)}, 

• r —1 

and hence (3.21) by 

(4-2) p(X;c) = w(N)p(\; 0). 

Here p(\; c) is the same as before, so that (cf. (3,29)) 

(L3) lim u(a~’ lN ) = oxp (- (2sz i ) m ). 

N —*oo 

Again, the first equation (3 25) holds without change and therefore we get in¬ 
stead of (3.30) 

(4 4) hm exp (- (Sw 1 ) 1 ' 1 ) 
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From (3.28) this is the Laplace transform of 
(4.5) f{t) = r 1 exp (- 2i*/t). 

As before we conclude that Cv —> /(1), which accomplishes the proof. 

6 . Proof of Theorem 3. We have seen in section 3 that the intervals in which 
(3.3) holds are in a one-to-one correspondence with the events Akifi). Hence 

(5.1) E(A») = S Pr [AM] + S Pr (A*(-c)}. 

To evaluate the sums we use (3.10). If N —> » and again c = ziV 1 , 7c/JV —► l, 
then by the central limit theorem 

(5.2) 

It follows then from (3.10) that 

(5.3) N~ m 2 Pr {Ai(c)J (2ir)~ m t {f(l - t)}~ m exp (~z 2 /2t(l - t)) dt. 

Jo 

Call the right hand member R. After the substitution t = sin 2 (<f>/2) we find 

~ = — 8(2tt) _ " 2 z f sin -2 <p exp (— 2 r a /sin 2 <j>) dij> 
az Jo 

fir/ 2 

(5.4) = 8(27t) _1,s z exp (—2/) / exp (— 2-1 cot 2 <j>) d (cot <f>) 

Jo 

= —2 exp (— 2 i a ). 

Since R —> 0 as z —> °o we conclude that 

(5.5) R = 2 j" exp (—2x 2 ) dx = {1 — $(2;)} (2ir) I/2 . 

The same asymptotic estimate holds for the other sum in (5.1), and hence Theo¬ 
rem 3 is proved. 


6. Proof of Theorem 2. Reorder the two samples in ascending order 
of magnitude and denote the rearranged samples by (X*, • • • , X„) and 
(Fi , • • ■ , Y*). When speaking of the graphs of the empirical distributions 
iS m (x) and TJx) we shall suppose that they have been completed by adding 
vertical segments so that the graphs become step-polygons. We shall put 


Then, according to (1.6) and (1.7) 


(6.2) 


— —> a, N = pn — qm. 
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Without loss of generality we shall suppose that 

(6.3) V ^ 5' 

In order to carry over the proof of Theorem 1 it is necessary to dofino the 
events Ak(c) in a judicious manner. For every integer k > 0 let vu bo the num¬ 
ber'of variables X, which are smaller than Y k . In other words, u k is dofmed 
as the integer for which 

( 0 . 4 ) Xn < Yt < XU- 

Finally put 

where, as usual, [a;] denotes the greatest integer contained in x. 

For 0 <h <n let A k (c) be the event that 

(6.6) n = m+o • 

The possibility of applying the proof of section 1 depends on the following 
Lemma. Whenever I 

(6.7) D m ,„ > ~ > 0 

n 

then at least one among the events Ai(c), Ai(—c), • • ■ , A„(e), A n ( — c) occurs. 
Conversely, if one of these events occurs then 

(6.8) D n , n > (c - |) / n. 

Proof. If (6.7) holds then either for some x 0 

' (6.9) S m (x o) - T„(x o) > - 

• n 

or the reversed inequality holds with c replaced by — c. It suffices to consider 
the case (6.9). For sufficiently large x we have S m (x) = T n (x) = 1. Hence the 
graphs of S m (x) and T n (x) + c/n must intersect at an abscissa £ > aro. The 
point of intersection lies necessarily on a horizontal segment of the graph of 
8 m (x) and a vertical segment of T„(x) -f c/n. Hence there exists a k such that 
£ = Y* and, moreover, 

(6.10) + £ < S m (l) 5 Tn(£+)+ 

n n 

This amounts to saying that 

(6.11) k - — 1 -+- c < L* < *_+_? 

n m, n 

In view of (6.3) and (6.5) this relation implies (6.6). 
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'Conversely, suppose that the event A k (c) occurs and let c > 0. Put again 
£ = Y*. Then, by definition, 

(6.12) S n (H) = -* = 5T.(£) = -. 

mm n 

It follows that 

(6.13) $,„(£) > ^±- C - I = T„(£) + - - - , 

n m n m 

which in turn implies ( 6 . 8 ). This proves the lemma. 

Theorem 2 is concerned with values of c such that cm -1 = in passing to 
the limit we must therefore put 

(6.14) c = z{n/j>)K 

Accordmgly, the relations (6 7) and (6 8 ) are asymptotically equivalent and our 
lemma shows that, asymptotically, the probability of (6.7) is the same as the 
probability that at least one among the events Hi(c), ■ ■ • , A N (—c) occurs. 
To evaluate this probability we proceed exactly as in section 3. The events 
U T and V T defined by (3.7) and the fundamental relations (3.9) hold again. 
However (3.10) — (3.12) have to be replaced by new evaluations. 

It is easily seen that the probability that exactly r among the X„ are smaller 
than Yh is the same as the probability to extract exactly r white balls before the 
7c-th black ball from an urn containing m white and n black balls (assuming 
that all orders are equally likely and that balls are not replaced). In this way 
one finds 

(615) Pr [A;t(c)l = ^ (^*+° k T k \)C(m ~j~ n flt+c — k,n k) 
k ’ Cim + n, n) 

Pr fA*(c) | A t (c) } 

(6.16) _ C(aic+ c — «r+c + fc — r — 1 , k — r — 1 )C(wi + n — «*,+„ — lc,n — k) 

C{m + n — a r+c — r,n — r) 

Yv{A k (c)\A r {-c)} 

(6.17) _ C(a k+c — a,-c + k — r — 1 , h — r — 1 )C(m + n — a^+c — k,n — k) 

C(m ;-f- n — a r -e — r, n — r) 

The second binomial coefficient in the numerator is common to the three ex¬ 
pressions and cancels when the expressions are introduced into (3.9). These 
fundamental relations assume a more natural form if the occurring binomial 
coefficients arc enlarged to terms of a binomial distribution, It is easily veri¬ 
fied that the first of the equations (3.9) reduces to 

B(a k+ c + k - 1 , k - l;g) 

B(m + n, n\ q) 

V p r , y , B(a k + C - p r+ „ + fc-r-l,&-r-l;g) 
rZi T ' B{m + n - a r+c - r,n — r;q) 

, p r ty i B(a Hc — a r - e + k — r — 1, k — r — 1; g) 
r B{rn + n - q r - e - r,n — r;q) 


( 6 . 18 ) 
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The second equation is obtained on replacing the combination k + c by 
7v — c. 

Instead of (3.17) wc put 

_ P ITT l £Qw + n, ft; g )_ 

« r - rr \ u r ] +n _ ^ _ r> n _ r; 3 ) 

< ' 619 ' > _ p , - B(m + n , n; g)__ 

Vr r * f B(m + n — a r - 0 — r,n — r; q) 

Then (6 18) becomes 

+ /c — 1, A — 1; (z) 


— 'f Qr+c “1“ k T ™ 1 } k T 1 , (?) 

r=»l 

4- L v r B(auc - flr- s + & - r - 1, fc — r - 1; g). 


This corresponds to the first equation in (3.18). Unfortunately ( 0 . 20 ) is not 
of the pure convolution type since a k+c — a,+ c and a*t+ c — a r _ c are not functions 
of the two variables k - r and c. The trouble comes from the fact that a*, 
as defined by (6.5), is not a linear function of k. It is, however, plausible that 
we shall commit only an asymptotically negligible error if we omit the brackets 
in (6 5), that is, if we replace m by pk/q. Purely formally (0.20) then reduces 
to the first equation in (3.18) with 


p k (c) = B - 1 , k - 1 ; qj . 


(Here the first argument in the right hand member is no longer necessarily an 
integer, and the factorials in the definition ( 2 , 2 ) should be interpreted by means 
of the gamma function.) To the new system (3.18) the considerations of section 
3 apply almost word for word: the only difference lies in the new norming (6.14) 
(which replaces (3.2)) and that instead of (3.26) we shall naturally let lc/n — » t. 
Thus the limiting form of Theorem 1 applies to the new system (3.18) with p k (c) 
defined by ( 6 . 2 ). 

It remains to prove that the formal replacement of (6.20) and the corre¬ 
sponding equation for - c, by (3.18) was legitimate. Now all coefficients 
in ( 6 . 20 ) are of the form B(v, r- t q), and we have only changed the first argument, 
v, adding a variable quantity which in no case exceeds ono unit. In passing to 
the limit we put k ~ in and c ~ zn*p~K It follows that we actually use only 
coefficients B(v, r, q) where v —> oo, r —> co and vjr —> q. Accordingly, for | d | 
< 1 we have B(v + $, r ,q) ~ B(v,r)q), and it is rather obvious that our system 
(6.20) is asymptotically equivalent to (3.18). 
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APPLICATION OF RECURRENT SERIES IN RENEWAL THEORY 

By Alfred J. Lotka 

. > 

Metropolitan Life Insurance Company 

Summary, The application of integral equations to renewal theory in popu¬ 
lation analysis and problems of industrial replacement is beset with, certain diffi¬ 
culties which have been particularly discussed by W, Feller (those Annals 1941 ( 
vol 12 pp. 243-267). Some of these difficulties are avoided if the data of the 
problem are introduced into the analysis directly in the discontinuous form 
(tabulated by class intervals) m which they arc usually supplied in a concrete 
case. A numerical example based on population statistics is presented, illustrat¬ 
ing how, using discontinuous data, a recurrent series takes the place of the integ¬ 
ral equation, and a finite exponential series appears in place of the Heaviside 
expansion of the previous solution. There is close analogy with the procedure 
previously presented, but with factorial moments appearing in place of ordinary 
moments. 

The fundamental data being given for values of the replacement function at 
discrete intervals only, some question arises as to the applicability of the solution 
as an “interpolation” formula for non-integral values of the time t, and as to the 
effects of subdividing the class interval of the original data. 

In the actual computation of the factorial moments a shift of origin by one- 
half class interval becomes necessary. An algorithm for effecting tins shift ia 
presented. 


1 . Methodology: Alternatives Available. 


AH application of mathematics to concrete situations involves a greater or loss 
degree of conventionalisation, a substitution, “in place of intractable reality, of 
an ideal upon which it is possible to operate.” 1 

This conventionalisation may be only such as to do little violence to the con¬ 
crete data, as for example when, dealing with a large population, wc treat the 
number N (l) of individuals at time t as a continuous variable, knowing perfectly 
well that strictly speaking it varies by jumps of one unit at a time. 8 

In dealing with any particular concrete case there may be considerable choice 
as to the mode in which the conventionalisation or idealisation is carried out, 
and the particular place or step in the scheme at which it is introduced. A good 
illustration of this is met in the treatment of renewal theory, as applied to human 
> populations or other biological or industrial aggregates. 

The majority of authors who havo dealt with the subject have set up their 
fundamental equations in terms of continuous variables. Many have gone fur- 


1 Nature, Vol. 110 (1922), p. 764. 

S If Population is subject to extreme variation in numbers, such that N(t) passes 
t rough small values, this disregard of their discontinuity may not be permissible. 



APPLICATION OP RECURRENT SERIES IN RENEWAL THEORY 


191 


ther than this in the process of conventionalisation and have assumed for the 
renewal function (net reproductivity) some more or less appropriate mathemat¬ 
ical expression, such as a Charher or a Pearson [1] frequency distribution, and 
have, wherever possible, carried out by standard methods the integrations in¬ 
volved. 

Others, while retaining the formulation of the fundamental equations in con¬ 
tinuous (infinitesimal) form, have made no specific assumptions regarding the 
analytical form of the renewal function, and have carried out the numerical in¬ 
tegration by one of the established methods available for the approximate in¬ 
tegration of arbitrary functions. 

But there has also been a minority of authors who deemed it most appropriate, 
since the data of the problem are actually furnished m tabular (and hence dis¬ 
continuous) form, to apply from the start discontinuous methods in formulating 
the fundamental equation for the problem This equation then defines a recur¬ 
rent series. 

The most recent and also the most concise exposition of this approach to the 
problem is a paper by W. Dobbernack and G. Tietz presented at the Twelfth 
International Congress of Actuaries, 1940, Proceedings, vol. 4, p. 233. These 
authors, however, do not give any numerical application, and in consequence 
certain aspects of the analysis are not touched upon by them. A more detailed 
presentation, including numerical applications, was given by the late S. D. Wick- 
sell 8 who, however, used only roughly approximate data (an over-all average net 
reproductivity for ages 20 to 44) and also introduced certain linear interpolations 
which would not be appropriate with more exact data, and which become un¬ 
necessary in the numerical operations if moments are introduced as indicated in 
what follows. 

The purpose of the present paper is to exhibit this modification of the method 
of recurrent series, and at the same time to illustrate its relatioil to the method 
which proceeds in terms of a continuous variable, leading to an integral equation. 

The B(t — a) women born in the calendar year (t — a), that is, between the 
times (l — if — a) and (< + § — a), will be a years old some time during the 
calendar year t, that is, between t — \ and t + §. If their births were evenly 
distributed over the year t — a, so will their birthdays of age a be over the year 
t, and their average age during that year will be a and the average number of sur¬ 
vivors to that age during the year t will be approximately B(t — a)p{d), where 
p(ci) is the probability, at birth, of surviving to age a. If the annual female 
reproductive rate, counting daughters only, is m(a) at age a, then the B(t — a )- 
p(a) survivors will, during the calendar year l, give birth to B(t — a)p(a)m(a) 
daughters. If B(t) is the total number of births of daughters in the calendar 
year t, then evidently, for positive values of t, 

CJ 

(I) B(t) = 2 B{t - a)p(a)m(a), 


* 12]; see also [3]. 
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or, to simplify the notation, 

(2) B{t) = £ e.B(f - «)• 

Equation (1) or (2) defines a recurrent series of the general form 

(3) B(i) = C\B(t — 1) + c 2 B(i — 2) + • • ■ + gJ3(1 — to), 

where some of the coefficients c may be zero and where « denotes the upper limit 
of the reproductive period. 

The trial substitution 

( 4 ) j 5(0 = Qx~‘ 
in (3) gives 

(5) 1 = CiX + Cix + • ■ ■ + ca 

The substitution (4) therefore satisfies (3) provided that a; is a root of the equa¬ 
tion (5) of degree to for x; and the same is evidently true for the more general 
substitution 

( 6 ) B(i) = £ < 2 ** 7 *, 

Jpnl 


where Xj, with j = 1,2, u, are the u roots of (5). 

Equation (5) leaves the u coefficients Qj indeterminate. In general they ap¬ 
pear as arbitrary constants. In any concrete application they may bo deter¬ 
mined by “initial” conditions; that is, in order to make the problem determinate, 
it is necessary to be given the values of B{t) for u successive integral values of t, 
or some equivalent data. 

While, for convenience in description, the analysis has been developed in 
terms of the year as time unit, the formulae are evidently independent of this 
choice of unit, provided that the unit employed is adequate for practical appli¬ 
cation. 

Whatever tho unit employed, for the direct application of (1) and (3) to a con¬ 
crete case it is necessary to have the data in such form that values of p(a)m(a) 
are known for integral values of a. The pertinent statistics do not usually come 
in that form, the fertility being usually known only for five year ago groups, and 
though it may be sufficient for practical purposes to regard thoso quinquennial 
values as representing p(o)m(a) for the midpoint of the group, this yields p(o)- 
m( a ) for fractional values of a, as measured in five year units. We may then 
proceed as follows: putting 


( 7 ) 


x = 1 + y 
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in (5) this becomes 

1 = {Cl + C 2 + C 3 + • + c u ] 

+ {Ci + 2 Ci 4* 3c 3 + 


+ { Cs + 3c 3 Cc-i + 


( 8 ) 


+ < c 3 + 4c,i + 3 Ocii -\- 


+ wc u }y 

|_ - 1 ) 
' r 21 


L o;(u — l)(w — 2) ^ 

~T yr, C; 


+ • 

+ {c*}y“ 

hwm<a—-h / I I 7 \ 

■EL ‘1‘ <W. 

A_0 )c_0 \ n / 

In application to a particular population, we shall usually have the condition 
c a = 0 for a = i, 2, • • < a 

where a is the lower limit of the reproductive period. 

The expressions in brackets (coefficients of successive powers of y) will be recog¬ 
nized as cumulations Sh of the values of the function c„ , summed backwards to 
the “diagonal” element c h , where h is the exponent of y. In terms of moments 
m of the function c„, equation (8) can be written 

nii — mi 2 m 3 — 3m 2 + 2»u 


(9) 1 = m 0 + mi y + 


2 ! 


v‘ + 


3! 


t’l 3 i i u 

- y + • + C a y 


or, using the symbol mjq to denote the /ith factorial moment, equation (9) takes 
the simple form 


( 10 ) 


1 = 2 


*-o hi J 


In these expressions the moments m h and m Vl ] are those taken about a = 0. 
Actually, the net reproduction rates are given for “semi-values” of a, that is, for 
values of a which are odd multiples of \ (using five years as the time unit). By 
cumulation of these given values moments vi[ and m' m about a = — f are 
obtained. 4 ^From the latter the corresponding functions of the moments about 
a = 0 are obtained by the transformation formulae 6 


(ID 


k-h / 11 Ik] 

m lh] _ V 2/ 

hi £& k\ 


(h - k )' ’ 


*=/i / i\[ij 
c< _ V 2) Q> 
oh - 2-i —n— “*-* ■ 
k-0 k. I 


* In these cumulations zero values of c a for 0 < a < a must not bo omitted. 

6 In accordance with a customary notation the symbol (—denotes the continued 
product — J(~4 — 1)(—J— 2) . (—J —fc + 1). In the computation of successive terms, 

in the sums m the right-hand member of (11), by appropriately laying out the work, ad¬ 
vantage is taken of the fact that values of (—i)M//c! for h = 2, 3 . are obtained each 
from the precoding by multiplying successively by f, i, etc , and taking care of the sign, 
so that fractions with complicated numerators and denominators are avoided. 
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It will be recalled that in the treatment of the problem of replacement by 
means of an integral equation, 6 a solution in the form 

(12) B(t) = 2 Qjxf = ZQ}^ 1 , 


is obtained, in which the exponential coefficients r,- are the roots of the equation 

fU fit* 

(13) 1 = / c~ ra jj(a)m(a) da = / x a p(a)m(a) da, 


3 . 0 . 


(14) l = m a - mir r 2 - ~ r 3 ■ = £ (-I)' 1 yr r\ 

2! 61 h-i hi 


in close analogy to equation (10) for y, with the distinction however that in (10) 
the factorial moments take the place of the ordinary moments of (14), and that 
the series in (10) is finite, terminating at the term in y There is also an impor¬ 
tant difference between the characteristic equation (13) and its analogue (5), 
namely that (5) may admit of negative roots for x, whereas (13) does not admit 
negative values for x. 


2. The constants Q. These are determined by initial conditions, as follows, 
Equation (2) can be written 


1 


(15) 

with 

(16) 


B(t) = X c„B{t — a) + £ cJB(l — a) 

aca{ 0™>1 

= F(t) + £ c B 5(f - a), 


Fit) = £ c a Bit - a) 
and a “‘ 

Fit) = 0 


The values of Bit) being given for integral values of t, from t 
< = 0, it can be shown that 7 


(17) v 


Qi = 


2 FiDx) 


Q*ai(J 

^ y ClCa X J 

a«,l 


0 < t < u 
l > « 

~(w— 1) to 


11 For a discussion of the limits of applicability of this method See [4). 

7 The reasoning is essentially the same as in the treatment of the problem by integral 
equations. See [5] and [2, p 39 etseq.]. ' 8 
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In the special case that we are tracing the progeny of an initial population all 
bom at the same time, say jS( 0) births occurring at t = 0, so that 

(18) B(—1) = B{- 2) = • • - = B(-[» — 1]) = 0 

the expression for Qj, in view of (5), reduces to a particularly simple form. 
For if we write the summation in equation (16) in expanded form, we have 


F( 1) = 

CjB(O) + c 2 B(- 1) + CzB (—2) + c 4 B(—3) + ••• + c u B(- w - l) J 


F(2) = 

<*B(0) + CjB( — 1) -j- CiB(— 2) + ■ • • + c u B(— co — 2), 

(19) F( 3) = 

c„.B(0) -f- CiB( — 1) + • • ■ -f- c u B(— co — 3) 

f(») = 

c w B(0). 

If now B{— 1), • • • , 

B(— co — 1), all vanish, then 

(20) £ F(t)x 

1 

= B(0)(ciai + ctx + cja; 8 + ■ ■ • + c u :t“) 

(21) 

= B{ 0) by (5). 

Hence, 


(22) 

Q _ 

Vtfj a=»w 

Q>CqZC 2 

a™l 

In particular 


(23) 

B( 0) - g Q, = B(0) S - 

so that 


(24) 

1 

Er-. = i 

Jsal / jCLCnOZl 

The constant -B(0) here evidently functions essentially as an arbitrary unit 
of annual births, and may with this understanding simply be put = 1, thereby 
simplifying the notation. This has been done in what follows, where con¬ 
venient, especially in the table of constants, Table 3 of the numerical illustration. 

The denominator in (17) or (23) can be evaluated for any root x, of (5) by direct 
summation if the coefficients c a are given or have been computed (as indicated 
below) for integral values of o; or, in a manner similar to that employed in passing 
from equation (5) to (8), the denominator can be expressed in terms of the cor- 
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responding roots y, = x, — 1 of (8) or (10), tho cumulations of c„ being replaced 
by cumulations of ac a . With the denominator so expressed, the constants 
Q, take the form, in obvious analogy to equation (9)‘ 

2 x‘,m 

(25) f) —___ 

*.+«.#, +^-p «•,+ -m j gl l s 

The alternative procedure, to which reference was made in the preceding para¬ 
graph, is to operate upon the moments 1 %,] (taken about the origin 0) by a 
process the inverse of cumulation—which we might term decumulalion —and 
in this way to obtain from them the coefficients c a . The polynomial 2as 0 ^“ 
can then be evaluated directly. 

The decumulation is readily carried out by an algorithm which suggests itself 
from the schedule of cumulation. Analytically the relation between the two 
processes is expressed by the reciprocal sets of transformation formulae: 
Cumulation 

(20) = * v"* (h + k\ _ , 

A! h \ h ) Ch+k ~ ^ h - 

Decumulation 

(27) c h = * if (-1)* ( h + k \ J*y*p _ 

m ) \ h / (h + k) 1' 


3. Constants Q associated with complex roots x = 

Ihe complex roots x , give rise to oscillatory terms which, in tho special case 
of the progeny of a cohort of 5(0) births, take the form 8 

2B(0)e~ u ‘ 

^ Qr^rjp 1° cos »l - H sin vt], 

where 

(7 = 2 ac a e~ m cos va 

QbI 

and 


ff — 2 aCd 8 
< 1-1 


‘ sin va. 


. ^] lese constants may be evaluated directly in this form, or, putting y = f + i„ 

° [ th ° 

.. WiU 6e *" » «» develop™,, o, 
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(a) Numerical Illustration. For convenience and to furnish the opportunity 
for comparison, the same data (United States 1920) were here employed as in 
the writer’s earlier publications m which the problem was treated by the appli¬ 
cation of an integral equation. 

(b) Cumulation for values of rti \. The two operations, of (1) cumulating the 
values of c a given for semi-values of a, and (2) allowing in the cumulated results 

NET FERTILITY 



Fig. 1. Net Fertility j>(a)m(a) White females, United States, 1920 

The verticals drawn in full and centered at mid-ages represent the original data; those 
drawn in dashed lines and centered at integi al ages aTe interpolated. 


for a shift of origin from a — — 4 to a = 0, can be conducted m one schedule as 
in Table 1. Cumulation is first carried out in the usual manner from the bottom 
line to the diagonal, with the result appearing immediately below the diagonal. 
From here on the procedure is as in the following example: Starting at the lower 
right hand corner, we find 

.00780 X (-§) = -.00390 

.12770 X (-§) = -.06385 -.06385 X (—f) = .04789 
.97395 X (-4) = - 48698 -.48698 X (-f) = .36254 

36254 X (-1) = -.30437 
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TABLE 1 


Computation schedule for values of ^ = S f of net productivity function p(o)m(a) - e 0 
for integral values of age a* 


a in 5-year 
units 

Ctt 

mm 

min 

m[ i\fl\ 

m di/31 

mui/41 


a) 

(2) 

C3) 

<*> 

(5) 

(«> 

(7) 

(8) 



1 16635 

6 64127 

10.04550 

24.34100 

23.10864 

16.05650 




-.58318 

.43738 

- 36448 

.31892 

-.28703 

0-1 

00000 

1 16035 

7 22445 

-3.61223 

2.70917 

-2.25764 

1.97644 

1-2 

00000 

1.16035 

6.05810 

19,82036 

-9 91018 

7 43204 

-0.19387 

2-3 

.00040 

1 16635 

4.89175 

13.76225 

31.00655 

-16.95328 

11.90496 

3-4 

09630 

1.16595 

3.72540 

8.87050 

18,14430 

33.62800 

-10.81400 

4-6 

.31265 

1 06905 

2.55945 

5.14610 

9 27380 

15.48370 

24.41096 

5-6 

31025 

.76710 

1 48080 

2,58565 

4.12870 

6.20990 

8.92725 

0-7 

23170 

44685 

73270 

1.00585 

1 54305 

2.08120 

2.71735 

7-8 

.15090 

.21515 

28586 

,36315 

.44720 

63816 

.03015 

8-9 

.05705 

06426 

.07070 

.07730 

08405 

.09096 

09800 

9-10 

.00615 

00630 

.00646 

00600 

.00675 

.00090 

.00705 

10-11 

.00016 

.00015 

00015 

.00015 

.00015 

00015 

.00015 









a jn 5-year 
units 

m(il/OI 

m(:j/7l 

m[i]/81 

mjij/DI 

tt»[ia]/lOI 


Factor 

(i) 

M 

(10) 

(U) 

(12) 

(13) 

(14) 

(15) 


6 72500 

1.99717 

.36404 

.03483 

.00127 

.00001 



.26311 

- 24432 

.22905 

- ,21633 

.20651 

-.19617 

-21/22 

0-1 

-1 77790 

1.62974 

-1 61333 

1.41875 

-1.33903 

1.27293 

-10/20 

1-2 

6 41964 

-4,87708 

4,47121 

-4.15184 

3.80236 

-3.07611 

-17/18 

2-3 

-9 97080 

8.72445 

-7.85201 

7.19768 

-6.68356 

6.26584 

-16/16 

3-4 

12.61050 

-10.50876 

9.10616 

-8.27564 

7.58000 

-7.04414 

-13/14 

4-5 

— 12.2054S 

9 15411 

-7 0284S 

6.67488 

-6 00739 

5 50077 

-11/12 

5-6 

12 38595 

-6.19295 

4.64474 

-3,87062 

3 38679 

-3.04811 

-0/10 

6-7 

3 4587C 

4 3126C 

-2 1503C 

1 61723 

-1.34769 

1 17923 

-7/8 

7-8 

74135 

. 8639C 

97395 1 —.48695 

.36524 

-.30437 

-5/0 

8-9 

1052C 

11255 

, 12005j .1277C 

-.06386 

.04789' -3/4 

9-10 

.00720 

.00731 

1 00760 ,00705 

.0078C 

-.003001 -1/2 

10-11 

OOOlf 

.00011 

| .OOOieJ ,00015 

,OO01E 

.000151 

i 


•Figures immediately below the diagonal, obtained by cumulation from tho bottom 
upward of the data in Column 2, are factorial moments about a = — Figures in tho top 
line are factorial moments about a = 0 For use of factors in the last column see text. 


The several columns are thus completed, and by addition, in each column, of 
the item immediately below the diagonal, and of all the items above the diag¬ 
onal, the figures in the top line are obtained. These are the coefficients of equa¬ 
tion (10) for y. 
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(c) Decumulation. While it is not necessary to carry out the decumulation, since 
the entire computation can, if desired, be carried out in terms of y’s and m's, 
there is a considerable interest in noting the values c a for integral values of a 
which result from the decumulations of the m’s These, together with the 
original values for semi-values of a, are shown in Table 2 and Fig. 1. 


TABLE 2 


Values of c a — p(a)m(a) 


(1) for semi-values of a; original data. 


(2) for integral values of a; computed by cumulation of original data, shift of 
origin , and decumulation. 


a 5-year units 


a 5-year units 

Ca 

a 5-year units 

Ca 

0.0 

0 

4.0 

.21781 

8.0 

.10607 

0.5 

0 

4.5 

.31255 

8.5 

.05795 

1.0 

0* 

5.0 

.33400 

9.0 

.02268 

1.5 

0 

5.5 

.31025 

9.5 

.00615 

2.0 

0* 

6.0 

.27427 

10.0 

.00116 

2.5 

00040 

6.5 

23170 

10.5 

.00015 

3.0 

02073* 

7.0 

.18963 

11.0 

.00001 

3.5 

.09630 

7.5 

.15090 


• 


*Tha value of c 2 came out negative, namely —.00570, and the value of ci came out 
+ .00014. In the computation of 2«c a x‘ these two values were arbitrarily adjusted to 

ii 


zero, and cj was diminished from .02118 to 02073 to make the total 
ing only for integral values of a 


2 Ca 

i-i 


116635, sum- 


4. The roots of equations (5) and (8). 

From the prior study already cited, the real positive and three pairs of complex 
roots for r of the characteristic equation 


(31) 



x a p(a)m(a) da 



e ™ p{a,)m(a) 


da = 1 


were known. These were used to indicate the approximate location of the roots 
of (5) or (8), and more exact values were then obtained by Newton’s method of 
successive approximation. Table 3 shows the values of u, v, etc., corresponding 
to the new roots 


y = x - 1 


= e~ u+ " 1 


- 1 
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obtained through equations (8) or (10), and, for comparison the corresponding 
values obtained in the previous publication from equation (13). The same 
table also exhibits the remaining roots and values of the constants Q, G, II, 


TABLE 3 


Constanta of the senes solution ( 6 ) of equation (S), corresponding to the five real and 
three pairs of complex roots of the characteristic equation (6) 

(United Stales, while females, IS20) 


Constants^ 


Five Reel Roots 


Three Pairs of Complex Roots 


A. Computed on basis of recurrent series 


u 

02714* 

—l,764f 

-3.8121 

-17.lt 

-94 3f 

-.19800 

-.44721) 

-.47587 

V 

0 

0. 

0. 

0. 

0. 

1 06498; 

1.57000 

2.40490 

G 

5.64467 

7.73354 

-1255.04 

fa 

C2) 

5 28093 

10.45809 

7.73103 

H 

0. 

0. 

0. 

0 

0 

3.03239 

-3 06726 

2.00874 

G/(G*+m 

.17716 

12931 

-.00080 

( 2 ) 

(2) 

. 14241; 

08515 

12117 


0. 

0 

0. 

0 

0 

.08177 

j -.02086 

.03148 


B Computed on basis of integral equaliont 


u 

.02714 





- 1930 

-.43055 

- .4002 

1) 

0 





1.0724 

1.5771 

2.44245 

G 

5 64514 





5 15361 

10.22405 

7.40154 

H 

0. 





2.08757 

1-3.72741; 

3.45312 

G/(G 1 +H i ) 

17715 





.14625 

.08020 

.11095 

H/(Gt+W) 

!° 

i 



.08420 

-.03135 

[ 

I .05175 


(1) t in five year units 
® Not computed 

= log, x 0 = -log, .97322 = .027X4 
t Values of x 
} See [6, p 899] 


To determine the remaining four roots, tho product of the factors (y — yi) 
(y — Vs) • ■ • (y - 1 / 7 ) was divided out of the polynomial of equation (10), re¬ 
jecting the remainder and leaving a fourth degree equation 

if + 120 f + 2590 f + 1-1617 y +• 23118 = 0 

In the subsequent work it turned out that the roots of this were all real, and 
they were computed by obvious methods. Thoir values are also shown in 
Table 3 For the two numerically largest roots great accuracy was not at¬ 
tempted. They introduce terms with very rapid damping and presumably 
vety small values of Q. 11 

10 The divergence is due in part to details of computation In tlio earlier publication 
the curve of fertility m(a ) was smoothed by the method of translation, with a Gaussian 
distribution as basi3 In the method here presented the raw data were used without 
smoothing, except such as is inherent in the process of the calculation described. 

'! At an y' rate > O 10 + Qu must be small, since Or 4- . + Q„ = 1.00313, and according to 

(24), with the convention that B(0) = 1, the sum of all the Q, must be equal to unity. 
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As a check, in order to be assured that no serious error was introduced in neg¬ 
lecting the remainder after dividing out the factors (y — y,) up to (y — yi), 
the product HJLi (y — yf) was computed and, after multiplying by a factor to 
make the absolute terms agree (.16635), was compared with the polynomial of 
(10). As a further indication, the coefficients of the product II were “decumu- 
lated” to obtain values of coefficients of the corresponding polynomial in x, to 

TABLE 4 

n 

Coefficients of Powers of y m Equation (10) and in the Product XT (2/ ~ y*) i 

l 

Also Coefficients of Powers of x in Equation (5) 


a 

Coefficients of 

Coefficients of in Equation (5) 
Pound by Decumulation 

In EquaLnin (10) 

In A(y — yi) 

Of Column (2) 

Of Column (3) 

a) 

(2) 

(3) 

(4) 

(5) 

0 

1G635 

.16635 

-1.00000 

-.99915 

i 

6 64127 

6.64072 

+ 00014 k 

+ 00065 

2 

16 64550 

16.64782 

- .00057* 

- .00432 

3 

24 34106 

24 24197 

.02118* 

02398 

4 

23.16840 

23 18070 

21781 

.21774 

5 

15.05650 

15.07338 

.33400 

.33354 

6 

0.7250 

6.73812 

.27427 

.27474 

7 

1 99717 

2.00316 

.18963 

.18882 

8 

.36404 

36555 

.10607 

.10641 

9 

.03483 

.03501 

.02268 

.02276 

10 

.00127 

00128 

.00116 

.00117 

11 

.00001 

00001 

00001 

.00001 


* In computing the denominator of Q according to (22) the values of the coef¬ 
ficients ci and C 2 were arbitrarily made zero and the value of c 3 (age 15) was ad- 

u 

justed to 02073 to retain the total 22 c, = 1.16635. 

i 

compare with values of c„ The results are shown m Table 4. In view of the 
fact that the (numerically) highest roots were determined only in first approxi¬ 
mation, the agreement is satisfactory. 

It is to be noted that instead of applying the solution (6) to compute values of 
B(t), these latter can, of course, also be obtained directly, by carrying forward 
step by step the original recurrent series, or, alternatively, the births in suc¬ 
cessive generations can be computed step by step and the total births obtained 
by addition. The advantage of the solution (6) is that it enables one, if desired, 
to obtain B(t) for any value of t without having to compute B(t) for all inter- 
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veiling values of f; also, the solution in an exponential series gives a better idea 
of the general nature of the process, as well as a direct indication of its asympto¬ 
tic course for large values of t, when the first term QqX o 1 with, the positive real 
root Xa dominates all others. However this may be, it is interesting to compare 

TABLE 5 


Synopsis of Results of Computation of B(t) as X Qx~‘, Column ( 8), and as 
XB„(l), Column ( 9 ), where B„(t) = Births per Unit of Time in nth 
Generation at Time t. (Time Unit — 5 years ) 



A = 

Qs~‘ = x~‘ 

l 

0 


* 

G ooa t it 

— Ii sin vfj 


- 1 G'+ H 

** or u ft 

.97322* 

-1.734* 

CO 

l 


- 

.44720 ^ 

-. 47587 # 

24 

t \ 

0 

0 

0 

1.00498 

1.07000 

2.40490 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

0 

17,716 

12,931 

-80 

28,482 


24,234 

100,313 

1 

18,204 

-7,330 

21 

-415 


3,828 

-13,781 

627 

2 

18,704 

4,156 

-6 

-19,498 

_ 

EsE3 

3,329 

-274 

3 

19,219 

-2,366 

1 


_ 

Ok? 3 


2,326 

4 

19,748 

1,336 




2,844 

-3,362 

21,588 

6 

20,291 

-757 


afiijjf n 



2,223 


6 

20,850 

429 


jjiylj 

- 

■1,162 

-749 

27,470 

7 

21,423 

-243 


jpplfjis 



-169 

19,745 

8 

22,013 

138 




475 

445 

16,823 

9 

22,619 

-78 


-4,294 


Hi 

-344 


10 

23,241 

44 


792 



145 


11 

23,880 

-25 


3,519 


-45 

-1 

27,328 

12 

24,538 

14 


2,265 


79 

-55 

26,841 

13 

25,213 

-8 


j™ frnvBxl- j 


18 

61 


14 

25,907 

5 


l 


-32 

-26 

23,878 

15 

26,619 

-3 


-1,188 


-8 

4 

25,424 

16 

27,352 

1 


385 


13 

■ 1 Kr 

27,757 

17 

28,105 

-1 


: mm 

1 

3 

— 7 


18 

28,878 

1 


mm 


-5 

4 


19 

29,673 



-251 


-1 

-1 


20 

30,489 





2 

-1 

29,874 

21 

31,328 





1 

1 

31,008 

22 

32,191 



160 


-1 

-1 

32,349 

23 

33,076 



343 




33 ; 419 
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TABLE 5 —Continued 



B.(0 

f 

2 Bn(t) 

Generations, n 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 


(9) 

(10) 

(ID 

(12) 

(13) 

(14) 

(16) 

0 

100,000 







1 

0 







2 

0 







3 

2,072 

2,072 






4 

21,781 

21,781 






5 

33,398 

33,398 






6 

27,472 

27,429 

43 





7 

19,866 

18,963 

£03 





8 

16,735 

10,607 

6,128 





9 

17,954 

2,268 

15,685 

1 




10 

24,033 

116 

23,889 

28 




11 

27,361 

1 

27,022 

338 




12 

26,878 


24,905 

1,973 




13 

24,696 


18,481 

6,214 

1 

' 


14 

23,851 


10,980 

12,858 

13 



15 

25,410 


5,345 

19,941 

124 



16 

27,759 


2,050 

25,030 

679 



17 

29,219 


526 ; 

26,316 

2,377 



18 

29,506 


76 

23,527 

5,897 

6 


19 

29,414« 


5 

18,092 

11,271 

46 


20 

29,862 



12,041 

17,579 

242 


21 

31,000 



6,906 

23,191 

903 


22 

32,348 



3,381 

26,442 

2,523 

2 

23 

33,423 



1,397 

26,426 

5,583 

17 


the result of the computation by means of the exponential series, carried out as 
set forth above, with the corresponding results of the computation of births in 
successive generations. This comparison is exhibited in Table 5. 

It will be seen that the agieement is good except for the second to fourth 
items, where perhaps the omission of the terms contributed by the numerically 
highest roots makes itself felt. 

5. Discussion. 

(a) The real roots of the characteristic equation (5). It can be shown [8] that only 
one of the real roots for x can be positive, and that the absolute value of any 
other root must be greater than the positive real root 
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The negative real roots which make their appearance in the numerical 
example call for special comment. Practically, the "higher” negative roots aro 
of little importance, at any rate in this example—first because the constants Q 
with which they are associated are relatively small, second because large absolute 
values of negative roots imply rapid damping, bo that corresponding terms Qz~' 
very soon become negligible as t increases Thirdly, the determination of these 
roots would be subject to a wide range of uncertainty, corresponding to the large 
percentage fluctuations or errors of determination of the values of the functions 
p(a)m(u) = Ca at the upper end of the reproductive period. 

But in theory these negative real roots suggest some pertinent questions. 
One wonders what would happen to them if the data were given, say, for single 
years of age, instead of 5-year groups. Instead of an equation of eleventh degree 
we would then have one of 55th degree. Furthermore, in those eases in which it 
may be permissible to pass to the limit, so that an integral equation takes the 
place of (2), negative roots for x would seem to be excluded as they would make 
the integral in (13) meaningless. 

A problem of perhaps little practical importance but of some theoretical in¬ 
terest may arise here, to which reference has also been made by P.II, Leslie in a 
recent article in Bioinctrika , M in connection with a different procedure. 


(b) Effect of finer subdivisions of histogram of p(a)m(a). The effect of this on 
equation (5) for x is not obvious at sight, since new coefficients would be in¬ 
serted between previous terms. The effect is more easily understood from a con¬ 
sideration of equation (8) for y. Here finer subdivisions would introduce new 
terms only beyond the last term originally present. The original terms would 
not ho changed at all m form , and those involving only lower momonts would 
be changed but little m numerical value , provided that the original histogram were 
not so coarse as to give inappropriate values oven for these lower moments, 

■ The result, then, of finer subdivision of the histogram, would be to change the 
computed values of the lower roots only in minor degree. But the four negative 
real roots, depending in considerable measure on the higher terms of (5) or (8), 
would presumably be materially altered, and might perhaps give place to further 
complex roots In any case they would be followed by now roots oven more 
i emote from practical significance than the original eleven. 


r wr mU f a l m formula. Strictly speaking, the solution (0) 

°t (2) is appheabie only for integral values of t. In particular, terms arising 
out of the negative real roots of (5) for * aro obviously not adapted to furnish' 
interpolated values of B(t) for fractional values of t, since fractional powers of 

. 13 j ee “ ld , [10) , Por a bncf summary and analysis of Leslie’s paper 191 soo a review 

Part II W The'fi T^ “i WG . B ln f Ch . e Jourl lnsL °f Acluanes Student’s Son , Vol. 4 (1046), 

HSSai3a ‘SoZtZ W -- T idX ™ th0d t0 thBSe pr ° blalns Beems t0 be due to 
p l i P ati n Waves, Jour, of Burma ResearchSoc , Vol, 31 (1941), Part I, 
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negative quantities in general are complex. Over the range of t where the first 
real root together with the three parts of complex roots adequately describe the 
process under discussion, these terms alone are, in this sense and to this extent, 
suitable for interpolation, disregarding the terms corresponding to the other nega¬ 
tive roots. 

Even less suitable for interpolation purposes, it would seem, would be terms 
arising from further negative roots that might be introduced by a finer sub¬ 
division of the histogram of original data. If we suppose this subdivision carried 
to great lengths, and if negative roots still appeared under these circumstances, 
they would give rise to rapidly oscillating positive and negative terms for even 
and odd integral values of t respectively (the time unit now being a subdivision 
of the original time unit) with no appropriate interpolation between these 
integral values 

One further point calls for comment, In the process of idealization of the 
problem discussed, it has been assumed that p(a) and m(a) are independent of 
time, and the conclusions reached must be construed in the light of this assump¬ 
tion In itself this would hardly call for comment, as it is a matter of common 
understanding, But the question does arise whether the assumption itself is 
free from implied internal contradictions. 

In a recent publication, P. K. Wholpton 13 has drawn attention to the fact that 
in times of rapid changes in the birth rate, the assumption of age specific fertilities 
being held constant at the values observed in a given calendar year may imply 
that some of the women had more than one first child, a logical impossibility. 

The data used in the present numerical example are derived from a period of 
relatively undisturbed birth rate (1920), and do not involve any such conflict. 
But, in the light of Wlielpton’s contribution one may ask the broader question 
whether the computation of an intrinsic rate of natural increase and related 
parameters based on age specific fertility as observed in one calendar year 
retain any practical value at all 

In answering this question, two consideiaLions will be weighed. First, that 
ordinarily the rates computed in the usual way differ but little from those ob¬ 
tained by taking into account order of buth as m Whelpton’s procedure. Sec¬ 
ondly, that the computation using over-all values of m(a) for all orders of birth 
combined is a relatively simple matter based on data commonly available; 
whereas the more complete treatment of the problem taking into account order 
of bilth is considerably more complicated and often not possible at all for lack 
of detailed data. 
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SOLUTION OF EQUATIONS BY INTERPOLATION 

W. M. Kincaid 
University of Michigan 

Introduction and summary. The present paper deals with the numerical 
solution of equations by the combined use of Newton’s method and inverse in¬ 
terpolation. In Part I the case of one equation in one unknown is discussed. 
The methods described here were developed by Aitlcen [1] and Neville [2], but 
do not seem as widely known as they should be, perhaps because the original 
papers are not readily available. (A short summary of Aitken’s work will be 
found in a recent paper by Womersley [3].) Mention should also be made of 
an interesting paper by Spoerl [4], which treats the same problem from a some¬ 
what different viewpoint. 

In Part II these methods are extended to sets of simultaneous equations 


PART I. EQUATIONS IN ONE UNKNOWN 

1. Nature of the problem. We first consider the problem of locating; to any 
desired degree of accuracy, a real root of an equation of the form 

(1) y(x ) = 0 

where y(x) is assumed to be analytic in an interval containing the root in ques¬ 
tion. Since we shall not be concerned here with the necessary preliminary work 
of separating the roots, etc., we may suppose that .to is known to lie within a 
given interval that contains no zeros of y'(x). (Multiple roots are thus ex¬ 
cluded; but of course any such root is a simple root of an equation obtained from 
(1) by differentiation, and the methods described below can be applied to this 
equation.) 


2. Aitken’s method of interpolation. The method to be described, which 
may be regarded as a generalization of Newton’s, depends on the use of inverse 
interpolation. It is therefore desirable to recall a few points from the theory of 
interpolation before proceeding further. 

Let / bo a function such that j(i) is known for t = 4, 4 , • ■ • , 4 . Then the 
Lagrange interpolating polynomial is defined by 


— f(i l) 


( 2 ) 


+ /(4) 


(t - h)(t - - 4) 

(k ~ 4) (k ~ 4) • • • (4 — 4) 

(t - ti)(t - fa) •■•(<— 4) 


(ti — k) (4 ~ 4) • • ■ (4 — 4) 

(t-k)(t- fa)- - - (t - 


+---+/(4) 


(4 4) (4 4) • ■' (4 4-1) 


207 



208 


W M. KINCAID 


We note that 

mo = m ~~ + m 7 

Jl — t2 ^2 — 4 


/ - o j 
m t- < 2 J 
k — u 


(3) 


/iea(t) - 


Mt) 1 - h 

! 

fm .n-i(0 l — h 

JM) 1 - U 

11 

ft 

M *(0 l- tn 

k — h 

k ~~ t, , 


so that fm.. Jt) can be evaluated for any given value /« of l by a succession of 
linear interpolations. It is convenient to arrange the work in a table like the 
following (n = 4): 

TABLE la 


t 

m 

1 

II 


Paris 

k 

m 

fn(U) 


1 

trrl 1 

k 

m 

Mu) 

fm(U) 

fllu(U) i 

u~u 

u 

m 

Mu) 

fm{t 0 ) 

i 

1 

U~h 

u 

m 

, 

i 


u u 


This form is well adapted for machine computation, for each denominator 
ti - l, = {U - l,) - (/o - U) automatically appears in one set of counters when 
the corresponding numerator is obtained in the other 

is known at one or more of the given points, this information is readily 
fitted into the scheme. For we see that 

(4) MO = Km Ml) = /(<,) + (l- h)/'(h) 

and all that is necessary is to repeat certain entries in Table la and to fill in col¬ 
umn I by using (4) as indicated in Table lb. The extension to higher deriva¬ 
tives is obvious 


TABLE lb 


t 

m 

1 

11 

m 

Paris 

k 

m 




U~k 



Mu) 




m 


fm(U) 


U~k 

u 

J(k) 

fu(U) 


hMU) 



fm{U) 


tortu 

h 

m 

Mu) 

1 

JmiiU) 



M(U) 

u~~u 



Mu) 





m 








SOLUTION OF EQUATIONS 


209 


In applying the above to obtaining the root x 0 of (1), we must suppose that 
y(x) is tabulated or can be computed for a set of values of x in the neighborhood 
of x 0 . What wc do not know is the value of x corresponding to y = 0. It is 
therefore convenient to regard x as a function of y whose value is known at cer¬ 
tain points and then interpolate to get Xo = x(0). That is, we let y take the place 
of t and x that of /(f) in the preceding discussion, while 0 replaces k • The work 
is slightly simplified by the fact that the column of “parts” becomes identical 
with the left-hand column which contains the y’s and can therefore be omitted. 


3. Application to an example. The procedure will be most clearly indicated 
by an example. Consider the equation 

(5) y = x 4 + 2x - 5x 2 - 8x + 1 = 0, 


which has a root between 0 and 1. (If the root were located elsewhere, it would 
be desirable to shift it to this interval m order to simplify the computation of y.) 

The work of evaluating this root to ten places is summarized in Table II, and 
explained below. In the first column, the numbers in parentheses arc values 
civ 

of —, and the other numbers are values of y, corresponding to the values of x 
in the second column. 


TABLE II 


3' 

X 

I 

II 

III 

1.000 000 000 000 
(-8 000 000 000) 
0.152 100 000 000 
-0.001 054 385 279 
(-9 081 459 548) 

0 008 022 855 936 
(-9 073 020 416) 

0 000 
0.000 

0 100 

0 117 

0 117 

0 116 
0.116 

0 125 000 000 00 

0117 938 436 13 

6 882 964 17 

3 896 94 

3 842 98 

4 254 15 

0 116 671 702 00 
884 075 87 

3 890 62 
90 67 
90 74 

0 110 883 877 01 
90 68 
90 68 


Xo = 0116 883 890 7 


The procedure is as follows. Taking x = 0 as a first approximation to x 0 , 
we find that 7/(0) = 1, y'{ 0) = — 8, and record this data in the y and x columns 
of the table. Note that for convenience, the value of y'(0) takes the place of a 
blank entry in Table lb. We now apply (4), which here takes the form 


(6) a’u(0) = x + (0 


1 . dx 
} dy 


= 0 + 


-1 




dy 


dx 

jeaO 


= 0 + 


= 0.125 


and enter the result in column I. Note that this is equivalent to one step of 
Newton’s method. 

In view of (6) we take x = 0.1 for our next approximation and apply (3) to 
obtain the second entry m column I and the first in column II. This last sug¬ 
gests x = 0.117 for our next trial value. (We do not compute y'{ 0.1), as little 
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Would be gained by doing so, and (.ho time is better spent in going ahead as 
indicated.) Finding i/(0 117) and filling m the table gives us the root to Bix 
places. 

4. Employment of tables. Continuing in the same line, it would seem nat¬ 
ural to take x = 0.116884 at the next step, and doing ho would lead to the most 
rapid convergence. But another consideration enters. Up to this point the 
values of y were computed with the aid of the WPA Table of Powers, which is 
limited to three places in the argument, Rather than going to the extra labor 
of evaluating y(. 116884), we proceed as indicated in the table, using j/'(.117), 
Z/(-116) and v/'(.116), and stopping when the values of x in the last column 
agree to the desired number of places 

This point has been dwelt on because it is likely to arise whenever tables are 
used in evaluating y(x) In the example just given, to be sure, we had a certain 
freedom of choice; but if y(x ) is not algebraic, direct computation may be quite 
impractical, It may be noted that in such cases the method of inverse inter¬ 
polation is not only faster than the simple Newton’s method bul is capable of 
giving more accurate results. 

The error in the final result can be estimated from the standard formula for 
the error of interpolation, but this may he awkward because it requires the 
evaluation of higher derivatives of x with respect to y. In practice it is generally 
safe to rely on agreement of different interpolated values, and of course the result 
may be checked by substitution in the original equation. One simple point is 
worth noting, however—if the error in the original column of .r’s is 0(e), that in 
the successive columns to the right is 0(« z ), 0(e s ), etc. 

5. Applicability of the method. Although the example we have presented is 
algebraic, the method is, of course, equally applicable to transcendental equa¬ 
tions. Moreover, it can be used, theoretically at least, to yield complex as well 
as real roots. The sole difficulty is that the numerical work becomes cumber¬ 
some in this case, how serious it is depends on the type of computing machines 
used. If the equation is algebraic, Bernoulli’s [5], [6] and Graeffe’s [7] methods 
are applicable In fact, they are likely to be the most effective since they do not 
require prior knowledge of a first approximation to the root. If the alternative 
procedure of replacing the equation by two simultaneous equations for the real 
and imaginary parts of the root is decided upon, the methods described in the 
next section may prove useful, 

PART II. SETS OF SIMULTANEOUS EQUATIONS 

6. Two equations; general considerations. It is natural to take up next the 
problem of finding the simultaneous solutions of two equations in two unknowns. 
Let these equations be 

W y) = 0, v(x, y ) = 0, 

where u and t> are analytic functions of x and y. 
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If we had a general method of interpolation of functions of two independent 
variables, the problem could be solved in a fashion similar to that used in the 
preceding section. That is, u and v would be computed for values of x and y 
near the desired ones; then x and y would be regarded as functions of u and v 
and interpolations would be performed to obtain the values corresponding to 
u = v = 0. 


It is easy to set up interpolating functions in a variety of ways, but the author 
has found none that are satisfactory for the problem in hand. Note that what is 
required is to determine the value of a function at any point in the plane, given 
its values at a set of fixed points The most obvious idea is to use polynomials of 
the least possible degree for this purpose, as is done in the case of a single variable. 
In this case, however, the coefficients of a polynomial of the nth degree are de¬ 
termined by its values not at n + 1 but at - n points; thus if a func- 

Ji 


tion is given at 5 points, no unique quadratic interpolating polynomial can be 
constructed. What is worse, even if a function is given at 6 points, say, the 
quadratic polynomial determined will in general have large coefficients and take 
on unreasonable values if all the points happen to lie close to a common conic. 
Other schemes considered by the author have similar drawbacks, though the 
possibility of course remains of finding a suitable one by further research. 

The problem can also he handled, at least in principle, by eliminating one of the 
variables; but, apart from the difficulty of carrying this out in practice, the 
resulting single equation is likely to be more complicated in form than the original 
two. If so, solving it may require more computation than would be involved in 
attacking the original equations directly by the methods described below. So 
far is this true that even when a single equation is given in the first place it may 
be advantageous to replace it by a set of simpler equations. 


7. Newton’s Method. Although a direct extension of the method of inverse 
interpolation is not presently available, Newton’s method may be suitably 
generalized for this case. 

Starting with equations (7), we set up the auxiliary variables 
(8) X = uv u — vuy , Y = uv x — vu x , 


where the subscripts denote partial derivatives; ~ , uy = — , etc. 

ox dy 

We have 


(9) 


dX . dX 

Z W* Vx 'Uy r Ui)xy VUxy t ' '— V/Vyy Vllyy , 

ax ay 

dY dY 

— — UVxx ~ VU Z1 , — = U v Vx — V u U x + UV n — VUxy . 

dx dy 
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For u = 0,ti = 0, equations (8), (9) reduce to 


( 10 ) 


X = Y 


dX = dV 
dy dx 


dX 

dx 


clY 

dy 


- J, 


where J is the Jacobian, of u and v with respect to x and y. 

Equations (10) will hold approximately for values of x and y near those satis¬ 
fying equations (7). That is, in the neighborhood of a solution X can be re¬ 
garded as a function of x alone and Y as a function of y alone. Then if x = Xo, 
y = y 0 is the desired solution, (xt, y\) is a point in its neighborhood, and xi = 
X(x\, yd, Y x = Y(x i, yd, Ji = J(x i , yd, we have 


( 11 ) 


Xo ' 


'Xi ~~ 


Xi 

J 1 ’ 


i/o Vi + 


Yi 

Ji 


Also if (itj, yd is another point near (xa, i/o), 


( 12 ) 


to ~ 


X\ Xi — Xt Xi 

~xT zr xT 


y« 


lh Yt - ?/2 Yi 
Yt - Yi 


Relations (11) and (12) can be used to obtain successive approximations to the 
solution. Use of these relations corresponds to employing Newton’s method 
and linear interpolation for the solution of one equation in one unknown. 

As a first example we consider the equations 

u a + xy -f y t — 3 = 0 
(13) 

v ss x’y + y — 1 = 0. 


We have 


(14) 


u* = 2 x dr V Uu = x + 2y 
v t = 2 xy v y = x z + 2y. 


Drawing a rough graph indicates a solution near (1, 1). We evaluate u, v, etc., 
at this point as shown in Table III. Using (11) we get (2,0) for a second approxi¬ 
mation, and proceed as before. We can now use both (12) and (11) to get new 
approximations; they are (1.33, 0.57) and (1.25, 0.50), and are entered in the 
last two columns of the table We therefore try (1.3, 0.5) next, and continue 
in this fashion until the desired accuracy is attained. Both (11) and (12) are 
used at each step and the values obtainod are entered m the last two columns. 
The entries in the numbered rows are obtained by using (11), the others by using 
(12). The number of places to take in each succeeding step is judged from the 
agreement shown. 

Table IV indicates the process of finding a second solution of (11) by the same 
method. The convergence is very rapid in this case, mainly because the first 
guess is fairly close. 
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8. Inverse interpolation. In the preceding section, attention was drawn to 
the difficulty that may arise when tables, necessarily limited to a certain number 
of places in the argument, are used in the computation. In the example just 
discussed the values of u and v were easily computed directly to the number of 
places wanted. But a glance at the work will show that if we had been limited in 
computing u and v to values of x and y having, say, two decimal places, the solu¬ 
tions could have been carried to four places only. 

The device adopted in the preceding section was to use quadratic and cubic 
interpolates to secure greater accuracy, and it might occur to us to try the same 
idea here. But for such an interpolation to be strictly valid, equations (10) 
would have to hold identically. Sjnce they hold only approximately, an error 
is introduced which, in general, is of the same order of magnitude as the error in 
linear interpolation. Thus continuing the interpolation would not improve the 
results 

However, this very situation suggests a way out. For suppose we give x a 
constant value x x , and compute X and Y for a number of values of y.' For x = 
x x , both X and Y can be regarded as functions of y alone, or we can regard X 
and y as functions of Y. Doing so, we can interpolate to any number of stages 
to find values of X and y corresponding to Y = 0; call these X x , y x . Assigning 
x other constant values Xi , x 3 , • ■ • , x,„ , we repeat the process, getting a set 
of values X 2 , ■ ■ ■ X m and ys, • • ■ , y m , all corresponding to Y = 0. Now 
along the curve 7 = 0 we can regard x and y as functions of X; performing one 
more interpolation, we obtain the desired values of *, y corresponding to X = 
7 = 0. The error in the final result can be estimated from the errors in the 
interpolations, and is of the same order of magnitude as the greatest of these. 

It will be noted that we did not refer to the definitions of X and 7 in describing 
this procedure. Any pair of independent (analytic) functions X' and 7' having 
the property that X' = 7' = 0 when u = v = 0 could be used. However, it 


is convenient to choose them so that 
simplest course is to set 


dX' ,67' „ 

-— and —- are small 
ay dx 


Probably the 


X' = a x u + b x v, Y' = a 2 u -j- by), 


where a x , as, fix, 62 aie constants such that 

on v„ a 2 v x 

(>i u„ b 2 u x 

Let us apply this procedure to the example we have already worked (Table III). 
Suppose we wish to use values of x and y having not more than two decimal 
places. Within this restriction, we can still carry through the first few steps 
indicated in Table III to ascertain that xo ~ 1.514, y 0 0.375 where (x 0 , ?/o) 
is the desired solution. At the point (1.51, 0.37) we have 


X = 3.0201u - 2.25u, 7 = 1.1174 m - 3.39y 
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Noting the ratios of tlie coefficients of u and v, we select 
X 1 = - 3t>, Y' = u — 3v. 

Next we evaluate X' and T for the 16 points having ^--coordinates 1.50, 
1.51, 1.52, 1,53 and y-coordinates 0 36, 0.37, 0.3S, 0.39, as shown in Table V. 
Starting with the four points for which x = 1.50, we interpolate to find the values 
of y and X' corresponding to Y' = 0; they are i/i = .3750000007, X[ — — .1406- 
250025. We proceed in the same way with the points corresponding to the other 
values of *; the results, as shown, are yt = .374998170G, X'i = — .03908741039; 
2/3 = .3749927660, X' 3 = .06302572977; y t = .3749839124, X\ = .1657149545. 
(The extra digits given in Table V are to take care of rounding-off.) Finally, 
using these values, we interpolate to find the values of x and y corresponding to 
X' = 0, and get 

x = 1.5138345192, y = .3749965140 

Comparing these results with those obtained earlier, we see that they are in error 
by about 1 unit in the ninth place; a distinct improvement over the four correct 
places that could have been secured without using this device. Note that if we 
had not had our earlier results for comparison, a check could have been obtained 
by carrying through the interpolation in the reverse order; i.c., starting with 
fixed values of y and finding values of x and Y' corresponding to X' — 0. 

As in the case of one equation in one unknown, derivatives could be brought 
into the interpolation scheme, permitting greater accuracy with fewer points. 

d»c QiC 

But the derivatives needed would be rj ~,, ~ , ,etc., and the general 

setup would be. rather awkward, so that extra labor would probably be required. 

9. Three or more equations. The methods discussed in this section are 
readily extended to the solution of three or more simultaneous equations in an 
equal number of unknowns. For example, if we are given three equations of the 
form 

u{x, y, z) = 0, v(x, y, z) = 0, w(x, y, z) ~ 0, 
we define new variables 


U V 10 | 


U» V t IVx 


Ux Ox 

Uy Vy Wy 

, Y - 

U V U’ 

, 2 

U y Vy ICy 

111 V t Wi 


U , V, w. 


U V V} 


which aie analogous to the X and Y of (8); from this point on the work is praoti- 
cally the same as before. 
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ESTIMATION OF A PARAMETER WHEN THE NUMBER OF 
UNKNOWN PARAMETERS INCREASES INDEFINITELY WITH 
THE NUMBER OF OBSERVATIONS 

By Abraham Wald 
Columbia University 

Summary. Necessary and sufficient conditions aie given for the existence 
of a uniformly consistent estimate of an unknown parameter 0 when the succes¬ 
sive observations are not necessarily independent and the number of unknown 
parameters involved in the joint distribution of the observations increases in¬ 
definitely with the number of observations. In analogy with R A. Fisher’s 
infoimation function, the amount of information contained in the first n observa¬ 
tions regarding 0 is defined A sufficient condition for the non-existence of a 
uniformly consistent estimate of 0 is given in section 3 in terms of the information 
function. Section 4 gives a simplified expression for the amount of information 
when the successive observations arc independent 

2. Introduction. J. Neyman has recently treated the following estimation 
problem 1 : Let X \, Z a , • ■ , etc, be a sequence of independent chance variables 
the distribution of each of which depends on some unknown parameters. Two 
kinds of parameters are distinguished, structural and incidental parameters. A 
parameter 0 is called structural if there exists an infinite subsequence of the 
sequence (X ; ) such that the distribution, of each of the chance variables in the 
subsequence depends on 0 Any parameter which is not structural is called 
incidental. Neyman has considered the case when there are a finite number of 
structural parameters, say 0 L , • • , 0, and an infinite sequence {£;), (i = 1, 2, 

• • , ad inf.), of incidental paiameters. He has studied the problem of consistent 
and efficient estimation of the structural parameters and has obtained several 
interesting results. He has shown, among others, that the maximum likelihood 
estimate of a structural parameter 0 need not be consistent, even when consistent 
estimates of 0 exist. Neyman has also given a method for obtaining consistent 
estimates of the structural parameters. This method, however, is applicable 
only under certain restrictive conditions. 

In this paper we shall consider a more general case than that treated by Ney¬ 
man, but we shall concentrate on one aspect of the problem, namely that of the 
existence of consistent estimates. 

Let !A,|, (i = 1, 2, • • • , ad inf,), be a sequence of chance variables, not 
necessarily independent of each other It is assumed that for each n the chance 
variables -Xf , • , X n admit a joint probability density function 

P ”( Xl , • • , *» | 0, ti, ■ ■ • , £») where 0, £i, £a, ■ • , etc. are unknown parameters. 2 

1 Address given by J Neyman at the meeting of the Institute of Mathematical Statistics 
in Atlantic City, January, 1947. 

. 2 W1 ^ 10 6 18 assumed to be a real variable, -we admit £, to bo a finite dimensional vector, 
le, d>- (fan ■ , ?•*,) where k, may be any finite positive integer. 
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We shall requite that the consistency relations among the density functions 
pi, p 2 , • ■ ■ ,etc. be fulfilled, i e , 

/■+» 

(1-1) / Pn+i dx a +i — p n , [n = 1,2, • , ad inf.). 

oo 

It should be remarked that it is not postulated that p n actually depends on all the 
parameters that appear as arguments in p n . It is merely assumed that p n 
does not depend on any parameter that does not appear as an argument in p n , 
i.e., p n does not depend on £, for any i > n. It follows, however, from (1.1) that 
if p n depends on a parameter £, then also p,„ depends on £ for any to > n 

Neyman’s definition of structural and incidental parameters can be extended 
to the case of dependent observations considered here by saying that the dis¬ 
tribution of Xi does not depend on a parameter £ if and only if the conditional 
distribution of X , for any given values of X x , - - , X,_i does not depend on £. 
It is not postulated that each of the parameters £i, £ 2 , • • ■ , etc. is incidental; 
some of them may be structural. We shall not make an explicit distinction 
between structural and incidental parameters, since for the purposes of the 
present paper this does not seem to be necessary. 

In this paper we shall deal with the problem of formulating conditions undei 
which a uniformly consistent estimate of 0 exists. A statistic t n {x 1 , ■ • , x n ) is 
said to be a uniformly consistent estimate of 0 if for any positive 5 

(1,2) lim prob, {| t n - 0 | < 5) =1 

Tin* eO 

uniformly in 0 and the £’s. 

In section 2 a necessary and sufficient condition is given for the existence of a 
uniformly consistent estimate of 0, In section 3 the amount of information 
supplied by the first n observations concerning 0 is defined. It is then shown 
that if the amount of information is a bounded function of n over a non-degener¬ 
ate 0-mtcrval, no uniformly consistent estimate of 6 exists. Section 4 gives a 
simplified formula for the amount of information m the case when the X’s are 
independently distributed. 

2. A necessary and sufficient condition for the existence of a uniformly 
consistent estimate of d. In deriving a necessary and sufficient condition for 
the existence of a uniformly consistent estimate of 8, use will be made of some 
results contained in a publication of the author [1] dealing with statistical decision 
functions which minimize the maximum risk. In [1] it is assumed that the 
domain of each of the unknown parameters is a closed and bounded set and that 
p n is continuous jointly in all of its arguments. Thus, in order to be able to use 
the results obtained in [1], we shall have to make the same assumptions here. 
In what follows we shall, therefore, assume that each of the parameters 8, £ 1 , 
£ 2 , • • , etc is restricted to 'a finite closed interval and that p n is a continuous 
function of si, • ■ ,*», fl, £1 . 

Let [a, b] (a < b) be the 0-interval to which the values of 0 are restricted. 
Clearly, if tjfri , , x n ), (ft = 1, 2, ■ • • , ad inf.), is a uniformly consistent 
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es tima te of B, then also t* is a uniformly consistent estimate of 6 when t* = 4 
when a g 4 ^ b, £ = a when t n < a and C = b when 4 > b. Thus, without 
loss of generality, we can restrict ourselves to estimates 4 which can take values 
only in the interval [a, 6], Uniform consistency of 4 is then equivalent with the 
condition 

(2.1) lim E[(t n - 9 ) 2 1 6, &, • • , £„1 = 0 

tt«*oo 

uniformly in 8 and the £’s. For any chance variable u the symbol 
Eiu | 8, & , , • •) denotes the expected value of u when B, , £ 2 , ■ • ■ are the 

true parameter values. 

In [1] a non-negative function 17(4, 0), called weight function, is introduced 
which expresses the loss suffered when 4 is the value of the estimate and 6 is the 
* true value of the parameter. The risk is defined in [1] as the expected value of 
the loss, i.e., the risk is given by 

(2 2) r„(0, £1 ,••■,£«) = E[W(t n , 0) | d, £ 1 *, • • ■ , £„]. 

If we put 17(4, 6) = (4 “ B) 2 , we have 

(2.3) r„(0, £„) = E[(t n - d) 2 \ 9, fc , • • • , £„]. 

It can easily be verified that Assumptions 1—4 in section 3 of [1] are fulfilled 
for the weight function 17(4, 8) = (4 — 0) 2 . 3 Thus, all results obtained in 
[1] can be applied to the risk function given in (2.3). According to Theorem 4.1 
m [1] the risk function given in (2.3) is a continuous function of 0, £ 1 , • • • , £„ 
for any arbitrary estimate 4 . We shall denote the maximum of (2.3) with re¬ 
spect to 9, £ 1 , • • , £„ by r„[4]. Thus r n [4] is a functional which associates a 
non-negative value with any estimate function 4 . 

It follows from (2.1) that 4 is a uniformly consistent estimate of 6 if and only 
if 

(2.4) lim r„[4] = 0 

, H=aOO 

For any 6 and for any n let F*(£i, • • • , £„ | 0) be a cumulative distribution 
function of £1 , • , £„. Let, furthermore, 

<ln(x 1 , ' ' , Xn | 9, F n ) 

(2.5) <•+» r+“ 

= P»Ui ,•••,*„ 10, £1 ■ ,£*)dF n (£i, ■ , £„| 6). 

J —co 00 

We do not require that F\, F 2 , - • • , etc. satisfy the consistency relations, i.e., 
lim F„+i(£i, ■ ■ , 4,+i | 9) is not necessarily equal to F„(£i, ■ ■ ■ , £„ | 0). 

fft+X — 09 

3 In verifying Assumption 4, we may assume that p n is always > 0, since for any given 
values 0, . , 4 we may restrict the domain of (a:,, . , *„) to the subset of the sample 

space where p„ > 0 
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Hence, also the distributions q T . do not necessarily satisfy the consistency rela¬ 
tions. Clearly 

(2 6) r„[<„] J f (l„ — 6fq n (x x , ■ • , x„ 1 8, F n ) dxi, ■ ■ ■ , dx n 

* ; —co J—c* 

for any Q and any F n . Hence, (2.4) and (2.6) imply that if l n is a uniformly 
consistent estimate of 9, then t n remains a uniformly consistent estimate of 9 
also when q n is the distribution of Xt, • • , X„ for any arbitrary choice of F n . 
For each n let C n {9, , ■ ■ , %„) be a joint cumulative distribution function of 

fi > ■ •' i . If this is regarded as an a priori distribution of 0, , 

and if our aim is to choose i n so that 

E(i„ - ef 

(2.7) r+ M r+“ 

= • • (<» - e) ’Pnixy , • • , I 9, fc, • , £„) dC n dxi ■ ■ ■ dx n 

CO oo 

is a minimum, then the best choice of t n is to put it equal to the a posteriori mean 
value of 6. Let t n (x i, • , x n ; C n ) denote the a posteriori mean value of 9 
when G n is the a priori distribution, i.e., 

j 9Pn(Xl , , $<i ) 0) (l ) ■ * ' , £n) dCn 

(2.8) , • ■ , a-„, C n ) = - 

J Pn(&l j , $n | 0, £l 1 * * , fn) dCn 

Where the integration is to be taken over the whole domain of the parameters 
0, £i, ■ ■ ■ , ( n ■ Let, furthermore, f„[C„] denote the value of (2.7) when i„ = 
t n (xi, ■ • • , x n ; C n ). According to Theorem 4.4 in [1] there exists a particular 
distribution C a n , called a least favorable distribution, such that 

(2.9) fn[Cn] ^ fn[C°] 
for all C„ . Let 

(2.10) tl{xi, ■■■ ,x n ) = £{xi , ■ x, n ; Cl). 

It follows from Theorems (4.5) and (5.1) in [1] that for any estimate i n we have 

(2.11) r n [t n ) S rjfn] = fn(C°n). 

Hence, a necessary and sufficient condition for the existence of a uniformly 
consistent estimate of 0 is that 

(2.12) lim f n \C\\ = 0 

71 buCO 

Let Fn(k i, • • , £« | 0) denote the conditional cumulative distribution of 
4i i ••■>£» for given 0 that results from the joint distribution C n (0 ,'£i ,••■£„) 
and let F°(£i, * • • , £ n | 0) correspond to Cl(6, |i, ■•■,?„). Clearly, any uni¬ 
formly consistent' estimate of 9 with respect to p n (xi , • • • , x n | 0, £i, - ■ ■ , £„) 
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is a uniformly consistent estimate also with respect to q n {x\ , • • , x n \ 8, F n ) 
for any F n . On the other hand, if q n (xi , ■ • , x n \ 8, F a n ) admits a uniformly 
consistent estimate of 9, equation (2.12) must hold and, therefore, p n {x\ , • • • , 
x n \ 0, £i, • ■ • , fn) admits a unifoimly consistent estimate of 6. Hence we 
arrive at the following theorem: 

Theorem 2.1. A necessary and sufficient condition that 

P»(«i U, fi f») 


admit a uniformly consistent estimate of 8 is that q n (xi, • ■ • , x n \ 9, F n ) admit a 
uniformly consistent estimate of 8 for any arbitrary choice of F n . 

3. Amount of information contained in the first n observations concerning 
the parameter 8. We shall make the following assumptions: 

Assumption 1. The first two derivatives of p n (xi , • ■ ■ , x n \ 0, fi, • • • , f n ) 
with respect to 9 exist. 

Assumption 2. We have 


(3.1) 

and 



dxi • ■ ■ dx n < 00 


( 32 ) 



dxi • • • dx„ < 


CO 


for any n. 

Assumption 3. The integral 


£- r 


8 2 log q«(xi , • • , X n I 9, F n ) 
86 2 


. • ■ • > x„ 1 8, F n ) dxi • ■ • dx n 


exists for any 0, F n and n where q n is defined by (2 5). 
Since 


d Z lo g _ 1 dV- (9 log q „V 
88 2 q n dd* \ 88 ) 

and since, because of Assumptions 1 and 2, 

r °° r ® rAn 

L a w dxi " dx ” = °> 

we have 


r-r 


8 2 log On 
80 2 


(Jn dot j_ ■ • * (JiXji 
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Let 


(3 4) c n (9) 


0: 


9 2 hgq n ' 

99 2 


q n d%i ■ ■ dx n 


Clearly c n (6) S 0. We shall now show that 


(3.5) 

c„+i(0) S c„(0) forw = 1,2, • • 

, ad inf. 

In fact, we can write 



-d 2 log qVi+iGi'i, 

• ■ • , *. 1 + 1 1 9, F n+1 ) _ d 2 log q n (xi , • - • , x n | 

9, Ft) 

(3.6) 

99 2 d0 2 

<9 2 log/ n+ i(.-c n41 la.'!, • • } x n 

, 9, F n+ t) 


96* 


where Ft = s 'hin F, l+1 

j ) £ti+ 1 | ^d /n-J-l(^n+l | j ’ ' f , 

9, F n+ 1) 




is the conditional probability density function of X n+t given the values of , 
■ • • ,x n and assuming that the joint density function of Xj, ■ • ■ , X n+ i is given 
by q n u(x'i j “ ' i *n+i I 0, K+i) Since c„(0) g expected value of 

f> 2 log i] n (x i, • • , x n 10, Ft) 


and since the expected value of - is 0, inequality (3.5) must hold. 

In analogy with R A. Fisher’s information function, we shall call c„ (0) the 
amount of information contained in the first n observations regarding 6 We 
shall now prove the following theorem- 

Theorem 3.1, If lim c n (6 ) ^ c < °° over a finite non-degenerate 9-interval I, 

71 Km CO 

then there is no uniformly consistent estimate of 6. 

Proof. If for any n, c»(0) lc< » over the interval I, for each n there exists 
a distribution , • , £„ | 6) such that 

0 < _ f” . f“ > d 2 log(j n (x 1> ■ , sjg, F„) 

(3.7) = -L« 962 

• q n {xi , • • ■ , .v n 1 0, F n ) dxi ■ ■ ■ dx n ^ c + 1 

for all n and for all 9 in I. Let i„ be any estimate and let 


b*(0) = E(tn - e) = [ ■ ■ [ (l n - 0)q v (x 1 , • , x n I B, F n ) dxi ■ ■ ■ dx„ 

(3 8) 

f" f" 

= • ■ i, ■■■ ,x n \9, F n ) dzi ■ - dx n - 6. 

*' CO *— 00 

Since 4 is bounded, it follows from Assumptions 1 and 2 that exists and is 

du 
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a continuous function of 0. According to a theorem by Cramer [2] wc have 

(l + ~ 

(3 9) E(i n - 6) 2 = I ■ • • (t - 0)V ' ■ ■ d.r„ ^ g 

for all 0 in I. Thus, m order that lim E(t n - O'f = 0 uniformly in 6, we must 
have 


(3 10) 


db n (6) 

hm-— 


= - 1 


unif ormly in 0 over I. Let I be the interval ranging from g to h (g < h). From 
(3.10) it follows that 

(3 11) hm [6 n (/i) - MdOl = g - h. 

It Ml 00 

Hence 

lim inf max [b*(0)l 2 = ^ ^ • 

1 n=oo 0 m / 4 

Since E(t„ - df A [6„(0)] z , E(t n — 6'f cannot converge to zero uniformly in 0 
and Theorem 3.1 is proved. 

4. Formula for c»(0) when p n (xi, ■ ■ ■ , x n \ 9, £i, • • • , ?«) is equal to vnfa 
| 0, £i) pa(<*»1 0, £j) • ■ • <pn(x n I 0, in). Let Oiixi I Xi, • ■ • , ai<_i , 0, F n ) bo the 
conditional probability density of X, given x 1} , x,-i when the joint density 

function of mi, > • , x n is given by g„( Xi , • • • , ,r« | 0, F„), (i Sn). Clearly, 



Now 


(4.2) 0,(x, | Xi, 


, a,_i, 0, J — J 


v’.ta I 0, fe) dHXii | , ■ ■ ■ , av_i, 0, F„) 


where f/»(£, | a*, • • , x.-i, 0, F n ) denotes the conditional cumulative distribu¬ 
tion of ij, given Xi, ■ ■ ■ , x,-i, assuming that F n (L , • ■ , | 0) is the joint cumu¬ 
lative distribution of fc ,■••,£* and p n (*i | 0, 6 ,•••,£*) is the joint 

density of Xi, ■ ■ ,X n for any given values of 0 , £i , • ■ • , i„ . 

It follows from (4.2) that 



d 2 log g { 
00 2 


g.dXi A C ni (0) 


= g.l b. 
o,({,) 



0 s log J_ + ^|0, (,) dCttii) 


00 2 
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where £'.(£;) may be any cumulative distribution of . Hence 
( 4-3) g.lt.[-s(^)] = Clh («) 

and, therefore, 

(4.4) cM = t c..(9). 

The quantity c ni (0) is simply the amount of information contained in the zth 
observation alone. Thus, formula (4.4) says that if Xi , ■ ■ • , X n are inde¬ 
pendent, the total information contained in the first n observations is equal to 
the sum of the amounts of infoimation contained in each of these observations 

singly. 
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INVERSION FORMULAE FOR THE DISTRIBUTION OF RATIOS 

By John Gurland 
University of California, Berkeley 

1. Summary. The use of the repeated Cauchy principal value affords greater 
facility in the application of inversion formulae involving characteristic func¬ 
tions. Formula (2) below is especially useful in obtaining the inversion formula 
(1) for the distribution of the ratio of linear combinations of random variables 
which may be correlated. Formulae (1), (10), (12) generalize the special cases 
considered by Cramer [2], Curtiss [4], Geary [6], and are free of some restrictions 
they impose. The results are further generalized in section 6, where inversion 
formulae are given for the joint distribution of several ratios. In section 7, the 
joint distribution of several ratios of quadratic forms in random variables 
X\, Xi , ■ ■ • , X„ having a multivariate normal distribution is considered 


2. Introduction. We shall write 


g(h, k, ■ , in) dti dti ■■■ dt n 

= lim J J ■ •• J g(t i, <2, • • • , t„) dti dt 




dt„, 


which might be called the repeated Cauchy principal value of 

« oo - oo <50 

J J j o(ti j is, * * * j in) dti • * dt n , 


and which we shall use frequently. The results of this article may be regarded 
as extensions of the following theorem proved in section 4. 

Theorem 1. Let X i , Xi, ■ ■ ■ , X n have the joint distribution function 
F(xi, x 2 , • , £„) with corresponding characteristic function 0(4 , k , ■ • • , („). 

LetG(x ) be the distribution function of (oiXi + • • + a n X n )/(biXi + ■ ■ - + b n X n ), 
where a x , ai , • , a n , bi, bi , • • , b„ are real numbers. If 

b, x, < oj = 0, 

then 



( 1 ) 


Q(x) + G(x - 0) = 1 


1 {/ i{i(ai - bix), • ■ • , i(a„ - m)} ^ 
TTl J i 


3. An inversion formula for distribution functions. Let F(x ) be a distribu¬ 
tion function and 0(f) be the corresponding characteristic function. Then the 
following inversion formula holds: 
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( 2 ) 


F(Q + F(i - 0) = 1 - A d e~' n 4>(t) - . 

Vi J 


Proof. 


(L‘+0 ^ i - (IS+0 t * dt £ e ’“ w 
=£«(!> I 


A dt > 


Mt-Q < 


by the Fubini theorem on the inversion of integrals. But 


- 

m J 


e Ulx - & j = Sgn (x - f), 


sin at 


dt is 


f T ' 

where sgn y - —1, 0, 1 according as y < 0, y = 0, y > 0. Since | - 

J— T 

uniformly bounded in T, the principle of bounded convergence for Lebesgue 
integrals implies that 

lim [ dF(x) ( f + f ) e‘ u ~° ~ = f sgn (x - |) dF(x) 

TTl <—»0 “— w \ v '— 7 " e J t J— jq 

= (/ + / + f ) sgn (x - f) dF(x) 

\J-co J ({) J £+o/ 

= -F(* - 0 ) + 1 - F({). 


The required result follows at once. 

Another form of (2) may be obtained as follows: Let FI(x), K{x) be distribu¬ 
tion functions, and \p(t), x(f) the corresponding characteristic functions. Setting 
F = H, <f> = £ = 0 m (2) yields 

77(0) + 17(0 - 0 ) = 1 - I qf *(i) f, 

TTl J t 

while setting F = K,<t> = X, in (2) yields 

£({) + - 0) = 1 - (f 

m J 1 


Clearly 

(3) JC(£) + 77(£ - 0) = 77(0) + 77(0 - 0) + -. <[ 7 ^ - <B. 

7T'l J t 

If 7/ = 77, then ^ = x, and (3) reduces to a well-known inversion formula (cf 
Kendall [7, p. 91]). 


4. Distribution of the ratio (aiZi + • -(- a n X n )/(biXi + + 

with denominator positive. Theorem 1 Let X h X 2 , • , X„ have the joint 
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distribution junction F(x i, x s , • • • , x n ) with corresponding characteristic function 
<j>{t\ tn)• Let Q(x) be the distribution junction of (aiXi + • • ■ + a n X n )/ 

(b 1 X 1 + •• + b n X n ) where ai , Ch, ■ • • , a*, h , h , • • ,b n are real numbers. If 
P( 2 ?b,X, < 0 } = 0 , then 

G(x) +G(x- 0) = 1 - - cf ^ ai -~ bl Ac‘- r t( - a 'Z.h x }i ( n, 

n J t 

g 

Proof. Note that 

P{|4 < »} = P{S(a, - MX, < 0), 

and let R x (|) = P(2(o, - b<x)X t < £j and x*(i) be the corresponding char¬ 
acteristic function. Clearly 12,(0) = G(x) and 


Xxif] — bi£) t ''' j t(,On b n x) J. 


On applying (2) to and setting \ = 0, the required result follows at once. 

If (3) is applied in place of ( 2 ), with K = G, we obtain 

<?(*) + G{x - 0) = H(0 ) + H (0 - 0) 

^ +1 1 ~ < M i ( ai ~ b . lX h “' > *(°» ~ dt 

irij i 

We shall consider (3) and (4) when n — 2 and 

Ah bA /1 0\ 

\ai h) \0 1 / 

Two cases will he treated separately; first, when Xi, X 2 are independent, second, 
when X!, X 2 may be correlated. 

If Xi , X 2 are independent, and F(x i, «>) = Fi(xj), P(oo, a 2 ) = ft(a*), with 
corresponding characteristic functions <fo(f) then ( 1 ) becomes 

(5) G(*) + C(x - 0) - 1 - 1 <f dt 

n J l 

while (4) becomes, taking H ~ F 

(6) <?(*) + G(x - 0 )=-.<[ dt 

n J t 

Cram 6 r [2, p. 46] proves, for X x , X 2 independent and ft(0) = 0, that 

Q( x ) = Jl. /"* ~ 0i (t)<k(-tx) ^ 

2 n' J-* t ’ ' < ‘ 1 ' 

under the following conditions: 


(i) Xi and X 2 have finite means; 



dt < *. 
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If A'i, Xt may be correlated, then (1) becomes 

(7) G(x) + <?(* - 0) = ] - - <f ^’ ~ tx) dt ; 

m J t 

while (4) becomes, taking H = F, 

(8) G(x) + G(x - 0) = i (f ~ tx) dL 

TTl J t 

Professor P. L. Hsu, in a course of lectures attended by the author at the 
Statistical Laboratory, University of California, gave the following result of 
Cramer, which was stated thus, using the above notation: 


(9) 

provided] J 


ffW = ss/' 


1 <h(t) - x*(t) 


dt, 


I 02 (t) — Xx(t) 


dt < m , where P 2 ( 0 ) = 0 


and Xi(0 is defined above expression (4). 

The following corollary is obtained from (1) according to well-known theorems 
concerning differentiation under the integral sign: 

Corollary , Suppose <j>(ti , U , • • ■ , t ,,) is the characteristic function corre¬ 
sponding to Xi , Xt , • • • , X„ , and G(x) is the distribution function of 


(a l X 1 + • • • + OnX.)/(Wi + • • • + W; 
then, if P[7tb(Xi < 0] = 0, 


( 10 ) 


G'ix) = A 6* d<t>(k ’ V - ' ’ ° 1 dt, 

J J**—ijfcr) 


in every interval in which the integral converges uniformly. 
If n = 2, and 


ai bi 


1 0 
0 1 


then 

(ID 




, It) 

dU 


ll!“— 1 1® 


dti. 


Cramer [3, p. 317, exercise 6 ] states the following result: 


If F(xi,X 2 ) = J J f(u, v) du do, and F a (0) =0, then 

«'<*>- 2 -s£[^ 


if the integral is uniformly convergent with respect to x. 
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n x l i>X 2 

Geary [6] has shown that if F(x\, x 2 ) = I / /(«, v) du du, F 2 ( 0) = 0, and 

u— oo *' _ oo 

X(£, v) = I v) du, then 


X 


dk } 


provided 

(i) 

(ii) 


k) = 0 for k — ± co, 

[ dy [ y\(t, y)(T' lvx dt = [ dt f y\(t, y)e~' tux dy. 

JQ v— im •*— on '-'0 


Formula (1) can be employed in the case n = 2, X \, X 2 are independent, and 


(cti bi\ 

\o,2 b 2 J 


n o\ 

vO 1 } 


to obtain closed expressions for the distribution functions of ratios in which the 
variable in the numerator and that in the denominator may have any one of 
the following four distributions: Binomial, Rectangular, x°> Normal. In the 
case of the four ratios with the binomially distributed variable as the denomi¬ 
nator, a translation must be made to ensure positiveness of the denominator. 
For the four latios with the normally distributed variable as denominator, the 
distribution function obtained is approximate; and the approximation is good 
if P{X t < 0) is sufficiently small (cf. Geary [5]) 


5. Distribution of the ratio (a L Xi+ • • • + a n X n )/{bxX x + h n X n ), with 
denominator positive or negative. The following theorem will be proven: 

Theorem 2 . Lei G(x) be the distribution function of ( a\Xi +>••-{- a n X n )/ 
( 61 Z 1 + • ■ • -f b n X „) where Oi , , • • ■ , a n , bi, b 2 , , b„ are real numbers. 

If P{ 2 ? biX i = Oj = 0, then 

G(x) + G(x - 0) = 1 - 1 (f 
( 12 ) Vl * 

0 + (f(ai — bix), • ■ , i(a n — b„x)\ + <£> — {£(ai — b r x), • • • , t(a n — 5,,®)) „ 

- at, 

where 


4> + (tl , <2 1 ■ ■ 




0 (h i U , ■ ■ 

= // ' / ^ hxi+ " + ‘ nXn) dF(x liXt , -.,x n ) 
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Proof. Let #*(£-) = P{Sh k X k > 0] -P{2X k {a k - b k x) < £ | Xh k X k > 0} 

+ P{26iX; t < 0) P{2Z fc (a fe — b k x > — £ | < 0}. 

Then Z?*(w) = 1, R x ( — ») = 0, and iL(£) is non-decreasing in £ and continuous 
on the right. Hence P x (£) is a distribution function (Cramer [ 2 , p. 11 ]) It can 
be shown by a proof analogous to that used by Curtiss [4] that the characteristic 
function of /?*(£) is 

— to), ■ ■ ■ , t(a n — !ve)] + 4>~{L(bix — ai), ■ , t(b n x — a„)] 

Since R x (0) = G(x), application of (2) to f? x (£) yields the required result. 


6. Inversion formulae for multidimensional distribution functions. The 
n-dimensional analogue of ( 2 ) will now be given, and will be applied to obtain 
inversion formulae for the joint distribution of several ratios 

Let Xi , Xz, ■ ■ , X n have the joint distribution function F(x i, xi , • ■ • , x n ) 
and the corresponding characteristic function <f>(h, U , • ■ ■ , f»)> bet 

^3i l J] i * * » 3k (h > » . t'.) 

be the characteristic function corresponding to the marginal joint distribution 
function of X n , X n , ■ ■ • , X ]k , where the sot ji, jz , • , jh is a permutation 
of k of the integers 1 , 2 , • • , n. Note that 


</*(h | tl , ' i 4) = <£l. 2 , , n(<l > U » • ■ ■ > Ll)- 

The summation X P(£i u . £ 2 ., , • • . £«,„), which will appear below is 

<>1, >2, ■ , *n) 

to be interpreted as follows: 


Defining £,<, = £7 if i, = 1, 

= £, — 0 if 1 , = 0, 

then X P(£i u , • • ■ , £„,„) will mean that the summation is to be taken 

(»l. l 2, * , *n) 

over all binary numbers .2122 • ■ • in. 

Using the notation of the preceding paragraph, we can state the following 
theorem. 

Theorem 3. Let A a , At, • • • , A n satisfy the n + 1 equations 


"if (f k r ) toifc = 1 , A n =~ 1, (r = 0,1,2, •■•,71- 1), 


where 

Then 

( 13 ) 


as usual, denotes the binomial coefficient. 


(:) 

(-l)’H-l £ P(£ln , • • . £*;») = to +x4f t X { f 

(<i,< 2 , •.*«) ' k=i[irl) n<n< ••<jk J J J 


•exp(— »(fi£ji + ■ • + 4£j(.)}4w--ji(k . h , • ■ ,h) 


dt\ dtq - ■ • dtff 
ilifl ■ • • tk 
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Proof: Since the theorem is already proved for n — 1 (section 3), and since 



-ittlfiH—-+ 1 »W 


) & j ‘ ‘ i O 


dti dt-i • ■ ■ dt n 
t\U ■ ■ • 


a CO A U> 

sgn (re, - fc) sgn (x 2 - ( 2 ) 

oO w— oo 


• ■ sgn (x n - £„) dl'Xx 1 

the theorem could be proven by induction. The result is obtained more quickly, 
however, by noting that it suffices to consider the case of independent 
j X 2 t ' ' ' i Xn . 

It may be remarked that if (£i , £ 2 , • • ■ , £„) is a continuity point of 
F(xi , Xi , ■ ■ • , x n ), the left-hand member of (13) becomes 

(-jy+VFfe,?*,... , £„), 

and also that differentiation of (13) yields 


(14) 


d n F(lji, £21 • • ■ , in) 

fadCf-dt, 


-(£)'{}■■ /•'. . 


•+1,.W 


4>(ti , < 2 , 


' 1 in) dix dti ' ' * dt» j 


in every n-dimensional interval in which the integral converges uniformly. 
This agrees with well-known results concerning Fourier inversion formulae. 

An inversion formula for the joint distribution of p ratios 


flliAl d~ (h 1 A ? *t“ ' * -p <J„i X n 
buXx + b^X7+ •■• + b ni Z>’ 


i - L, 2, • • ,p (1 < p < n), 


can be obtained from (13) by a method similar to that applied in section 4. The 
following theorem holds: 

Theorem 4. Let 


and 4>{tx, k, , Q be the characteristic function corresponding to Xi, 
x *> ■■ ,X n . ThmyifPiXb^i < 0} = 0 (ft = 1, 2, p), 


( l) p+l Z (?(£«,,£2i,, ,£*<,) 

(<l *2 •’<!,) P 


(•15) 


* + S w ~ b “‘ 
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The following corollary is a generalization of (10) and follows by differentia¬ 
tion of (15)" 

Corollary. Suppose G(xt, a- 2 , • • , a:,) is Ihe joint distribution function of 
the p ratios 


ftij .Yi -(- + a nt X. 

lh,Xi -f- ■ • • + b nl X n ’ 

and 4>(t\ ,k, ■ • ,t n )is the characteristic function corresponding to X x, X 5 , ■ • • , X„, 
then, if 7 5 |2”=i 1>„X, < 0} = 0, j = L, 2, ■ ■ ■ p. 


( 16 ) 


w&ki-'Jti _(±\* U. I 

I 6^2 • • • \ 2 iri/ J J J 


S hu hi: 


• b 


k P 


f> r <t>(h, 4_, 
dig 


01 


i k -$T,(**i-b*iU) dndT2 ' (lr »> 


m every p-dimensumal interval in which the integral converges uniformly 


7 . Joint distribution of ratios of quadratic forms. Lei Xi , X 2 
have the joint probability density function 

t( v \ = 

f{) (2ir) nli 


, Xn 


where x = (a* , ■ , •&») and B is a positive definite symmetric matrix. Sup¬ 

pose Q is a positive semi-definite symmctiio matrix of rank r < n and 
li\ , Jj2 ) ’ * ' ) is a set of symmetric matrices. We wish to obtain the joint 
distribution function Gfa , ? 2 , • , |„) of the p ratios 


XLiX' XL»X‘ XL,, X' 
XQX' ’ XQX 1 ’ " ’ XQX' ’ 


where X = (Xi, X s , ■ , X„) 

The existence of such an oithogonal matrix S that SQS’ = I , where I (,) is 
the diagonal matrix having the fiist r diagonal elements equal to unity and the 
rest equal to zero, is well-known. Let X = YS, C = SBS', Mi = SLiS 1 , and 
note that C and the M, are symmetric matrices Also 


f YMt Y' 

GO; i > & i ' ‘ • > In) = B | YJWY' ~ ^ 


YM P Y' 
' YI M Y' 



where Y = (Fj, Y ,, • • * , F„) has the probability density function 


g(v) = 


(detC) 1 -he*' 
( 2 tt )"' 2 


and y = (ja , y», ■ ■ ■ , y n )- 

Suppose the Li mutually commute m pairs Then so do the M t ; for MiM, = 
SLiS'SLiS' = SL.lfS 1 = SL,LiS' = SL,S'SL,S' = A/,M t , since S is orthog- 
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onal. Hence, there is an orthogonal matrix U which simultaneously reduces 
each M to diagonal'form; that is, N = UMU' is a diagonal matrix (cf. Weyl 
[8, p. 25]). 

Let Y = ZU, D — UCU', so that 


G(h , &, 


ZNJJ = r wZ) 
1-1 

t ) = P I < t • 


V r /^ 
T ' " j 


<>) y 2 — f P I ’ 


where vj r) = 1 if j < r; 


= 0 if j > r, 


and Z = (Zi , Z 2 , ■ , Z n ) has the probability density function 

j,/V\ _ (det D) - 1 jo 3 ' 

* w - ' ■ 

where a = (zi, z 2 , • • • , z„). 

We can now apply the results of section 6, If i p(t h is the character¬ 

istic function corresponding to the joint distribution function of Z \, Z\, ■ ■ ■ , Z\ 
it is clear that 


. . s f det/Z 1 l 

Wi.fe, •“,/.) [ clet(i) _ 2 iy)J ’ 

where T is the diagonal matrix whoso diagonal elements are h , k , ■ • • , f n . Ap¬ 
plying (15), with <j> = \p, we obtain, since G is obviously a continuous distribution 
function 


(17) 


(-1 y +1 2’,|,) 

= A + 1 & . <„//•"/*{ § ”'K -y- 



(r> t ^ \ dwi ■ • • dw k 

p n qjj ( - 

‘ ) WlW 3 ■■■ Wk 


r=i («)* !,<!,<...</* 7/ 7 


dot D 


det j — 2fS a g 2 W i(wi t ~ ^ r> ?/,) 


1 dwidwt •. < dwi , 
vh Wi • • • in* 


where D = [d«p] and is the Kroneclter delta. 

It is, of course, evident that a result analogous to (17) could be obtained, by 
considering p ratios 
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XL, X' XL, X' XL,, X' 

XQ, X' ’ XQiX'’ ’ XQ P X" 

where the 2 p matrices L,, L,, • ■ ■ , L,, , Q ,, Q, , ■ ■ , Q p are symmetric and 
mutually commute in pairs, and Q,, Q ,, ■ • , Q„ are positive semi-definite. 

In the case p = 1 in (17) and for special classes of matrices L,, Q x , B the cal¬ 
culus of residues may be employed to obtain closed expressions for the distribu¬ 
tion of 


XL, X' 

XQ,X' ‘ 

Formula (17) can be applied to obtain the joint distribution of serial correla¬ 
tion coefficients with different lags. The author plans to incorporate these 
results with those mentioned at the end of section 4 in a forthcoming paper, 
written jointly with Roy B. Leipnili. 

The author wishes to acknowledge the valuable criticism of Professor H. Lewy, 
and Especially the constructive advice and suggestions of Roy B Lcipnik. 
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THE FACTORIAL APPROACH TO THE WEIGHING PROBLEM 1 


By 0, Kempthobne 
Iowa Stale College 

1. Sununaiy. The weighing problem is discussed from the point of view of 
factorial experimentation. The paper contains a brief description of the frac¬ 
tional replication of the 2" factorial system. It is shown that optimum designs 
for the weighing problem may easily be obtained with this approach. This 
approach is valuable in indicating the structure of weighing problem designs, and 
the limited conditions under which such designs can give results of value. 

2. Introduction. Considerable attention has been given recently to the 
problem of weighmg a number of light objects on a scale [1,2,3,6] The problem 
was originally proposed by Yates in his paper on complex experiments [4] as an 
example of a factorial experiment in which interactions between the factors tested 
would not be expected to exist, that is, the weight of say two objects could be 
assumed to be the sum of the weights of the objects weighed separately, after 
taking account of any necessary zero corrections Such a situation is compara¬ 
tively rare in biological research when, for example, the offect on yield of a parti¬ 
cular crop from the joint application of two nutrients is usually different from the 
sum of the effects of separate applications. In recent years attention lias beon 
given to the use of fractional replication in factorial experiments [7, 8, 9] and it 
is proposed in this paper to consider the weighing problem from this point of 
view. 


3, The 2 n factorial system. A full description of the 2" factorial system was 
given by Yates in his technical communication The Design and Analysis of 
Factorial Experiments [5]. Yates was particularly concerned with the analysis 
of such experiments and with the evolution of systems of confounding in order 
to reduce the number of plots in each block. The following brief account is 
given in order to facilitate the discussion of the weighing problem. 

In a single replication of the 2" system all combinations of n factors each at 
two levels are tested. With three factors, a, b, c, for example, the following 
eight combinations are tested: (1) a, b, ab, c, ac, be, and abc, where (1) denotes the 
contiol, a the application of treatment o only, ab the application of treatments 
a and b, and so on. A set of seven independent comparisons between the eight 
test results is given formally by the expansion of the formula ' 

H® 1)(6 =t l)(c ± 1), 


1 Journal Paper no J 1548 of the Iowa Agricultural Experiment Station, Ames, Iowa. 
Project No 890 
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where at least one of the signs is negative. If, for instance the first sign only is 
taken to be negative, a foimal expansion gives the expression 

|{afic — be -\- ab — fi + ac — c + a — 1), 

and this contrast of the observations gives the effect of the factor a averaged 
over the presence and absence of the factors b and c, which is denoted by effect A 
Similarly taking the negative sign in the second bracket only, we get the average 
effect B , 

B = |{abc — ac + ab — a+bc— c + ii — 1}. 

Taking negative signs in the first and second brackets we obtain the interaction 
AB • 


AB = ||abc + c + ab + (1) — ac — be — a — &}, and so on. 

The definition of effects and interactions may be presented very simply in 
geometrical terminology, by representing the treatment combinations as points 
of an n-dimensional lattice, each axis of the lattice having two points at unit 
distance apart. The control treatment will have coordinates (0,0,0 , 

0), the treatment consisting of a only will have co-ordinates (1, 0, 0, • • ,0) 
and so on. The effect A is then the difference of the mean yield of the treatments 
corresponding to the points lying on the hyperplane 

Xi = 0 , 

and the mean yield of those represented by points lying on the hyperplane 

Xy = 1. 


The interaction of two factors a and b, represented by the axes xy and x* respec¬ 
tively, will be obtained from the difference of the mean yields of those plots for 
which 

X\ -f xi = 0, or xi + Xi = 2, 

and those for which 


+ a* = 1. 


The extensions to the above for three-factor and higher order interactions are 
simple. The interaction of factors o, b, and c, which are represented by coordin¬ 
ate axes xy , Xi, and a: 3 , is given by the difference between the mean of plots 
represented by points for which 

. X\ -h Xi -|- Xi = 0 or 2, 


and those represented by points for which 

xi + % + x 3 = 1 or 3; 

in other words, it is the difference of the mean yields of those plots for which 
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Xi 4- -T 2 + a; 3 = 0 (mod 2) 

and of those for which 

a.'i + x 2 + ,r 3 = 1 (mod 2). 

Each effect or interaction is then defined as the mean difference of two sets of 
plots, each set being represented by points on parallel hyperplanes, and the pianos 
of one set of parallel hyperplanes lying between the planes of the other set. It is 
necessary to specify only the direction cosines of the hyperplanes in order to 
specify the effect or interaction, and the usual terminology for effects and interac¬ 
tions follows, in that the interaction of factors a,b,c, for example may be repre¬ 
sented by the symbol ABC. 

In the same way as effects and interactions are defined in terms of the yields 
of the several treatment combinations, the expected yield from each treatment 
combination may be expressed in terms of the mean level of yield and the true 
effects and interactions If the full set of combinations of the factorial scheme 
is tested, the best estimate of each true effect and interaction is the same func¬ 
tion of the obseived yields that the true effect or interaction is of the true yields. 
This fact is one of the advantages which follow from the use of the full factorial 
scheme 

We are not concerned here with factorial experiments in which the factors have 
more than two levels, but when the number of levels of each factor is the same 
prime number, effects and interactions may be represented by products of powers 
of the symbols for the factors. In the case of two factors (a, b) at three levels, 
for example, the mam effects may be represented by A, B, and the interactions 
by AB and AB 2 , each symbol referring to the two independent contrasts between 
three sets, each of three plots 

As an example of the use of the above representation, wc may consider con¬ 
founding, that is, the arrangement of the treatment combinations in blocks in 
order to reduce the experimental error. Suitable arrangements are such that 
contrasts between the blocks represent high order interactions which the experi¬ 
menter is confident will be of negligible size. 

If treatment combinations for which 

m + <x 2 x 2 -f • • • -f- a„ x„ = 0 (mod 2) 

and for which 

At + ft % + • ■ • + ft, x n — 0 (mod 2) 

. are arranged in a particular block, then the coordinates of the treatment combina¬ 
tions in this block also lie on the hyperplane 

(“i + ft) 3 * + («2 + Pi)xi + • • + (o« + Pn)x n = 0 (mod 2), 

where the coefficients (a : + ft) must be reduced modulo two. If, therefore, the 
treatments are arranged in blocks so that two comparisons are block contrasts, 
then the generalised interaction of these contrasts is also a block contrast 
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4. Fractional replication. The principle of fractional replication follows 
very simply. Suppose only those treatment combinations whose yields all occur 
either in the positive or the negative part of a particular contrast are represented 
in the experiment, that is only those combinations represented by the points of 
the lattice for which say 

aiai] + a 2 .r 2 + •• • + = 0 (mod 2). 

Then the comparison between the yields of those plots represented by 

ft -Ti + ft a? 2 4- • • ■ 4- ft x n = 0 (mod 2) 

and by 

ft X, + ft £2 4- • • • + ft X„ = 1 (mod 2) 
will be identical with the comparison between the yields of plots represented 

by 

(ai + ft)zi 4" (<*2 4- ft)a - 2 + • • ■ 4* (&n + j9»)x» = 0 (mod 2) 

and by 

(<vi 4 ft)*i 4* (“2 4- ft)a - 2 4- • • • 4- (<*» 4" ftjati = 1 (mod 2). 

The former of these two comparisons may be represented by the symbol 
xi'xi* • • ■ xi n , and the latter by where a, , ■ • • , x n 

are no longer coordinates but symbols for the n factors, which satisfy the relations, 
a;“ = 1, if a = 0 (mod 2). The equivalence of the two comparisons maybe 
obtained by the use of an identity relationship in the symbols for the factors 

I = xVxP ■ ■ ■ xl" 

where I is interpreted as unity, and only those combinations whose coordinates 
(xi , xi , ■ • • , x n ) satisfy one of the equations 

at Xi 4" «2 X 2 4" + <x n x n = 0, or = 1 (mod 2), 

are represented in the experiment. If this identity relationship is multiplied 
by the symbol a^'seS* ■ ■ ■ xi n by ordinary commutative algebra, reducing the 
powers modulo 2 where necessary, we obtain 

■ ■ ■ xi" = ■ ■ • xi a " +M . 

It is more convenient to revert to the common use of capital letters A , B, C, etc. 
for effects corresponding to small letters a, b, c, etc. for the factors tested An 
experiment in half-replicate is then represented forma.ly by an equation of the 
type 

I = ■■■ 

In such an experiment on n factors only 2” -1 treatment combinations will be 
tested. Of the 2" — 1 independent comparisons in a fully replicated experi- 
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ment, information on one comparison is lost completely since only those treat¬ 
ments which appear in the comparison with the same sign are represented: 
the remaining 2” - 2 independent comparisons of a fully replicated experiment 
are identical in pairs giving 2 n ~ l - 1 independent comparisons. Each com¬ 
parison is then said to have two aliases and measures the sum (or difference, 
depending on which half of the treatment combinations are used) of two effects, 
an effect and an interaction, or two interactions, 

A quarter-replicated experiment can by the same process he represented by an 
identity relationship of the form • 

J __ ^ (“l t-«s>2j0h+fti)£i(Ti 1-Ts) . , . 

It is useful in the evolution of fractional designs to note that the elements in the 
identity relationship form an Abelian group. 

Fractionally replicated experiments are formally identical with confounded 
experiments in that block differences may be regarded as additional factors in the 
confounded experiment. A 2" experiment arranged in 2 P blocks, for example, 
may be regarded as a 1 in 2 P design of a 2 n+p experiment. Considerable care 
needs to be exercised in the use of fractionally replicated designs, but they have 
been found to be very useful in agricultural and biological research. 

5. The weighing problem. The problem of weighing a number of objects 
may be regarded as the problem of the estimation of the effects of a number of 
factors which do not interact, To take a simple case, consider the estimation of 
the effects of factors a, b, and c for which one complete replicate would consist 
of the combinations 

(1) a, b, ab, c, ac, bo, and abc. 

Suppose a half replicate design is used, based on the identity relationship 

I = ABC. 

The combinations tested would then consist of either the sot {a, b, c, abc ) or the 
set {(1), ab, ac, be}.' If the former set were chosen, the comparison estimating 
the effect A could also be ascribed to the interaction BC, that estimating effect 
B also to the interaction AB, and that estimating effect C to the interaction AC, 
as can be observed by multiplying the identity relationship by A, B, and C m 
turn. If the experimenter is confident that the two-factor interactions are 
negligible, then any effect given by each comparison would be ascribed to the 
main effect. 

6. Discussion of a particular case. We give the derivation of a design for 
weighing a particular number of objects, say ten Let the objects be denoted 
by d, b, c, d, e,f,g, h, k, l. Then the total number of combinations which could 
be tested is 2 1 , that is 1021, but as we arc confident that interactions are negligi¬ 
ble, it is necessary only to estimate mam effects. 

A fractionally replicated design must consist of a number 2 P of combinations 
and this will be a 1 in 2 10 ” design. A suitable fractionally replicated design 
consisting of 16 combinations will exist if it is possible to evolve an identity 
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relationship for a 1 in 64 design, such that each term in the relationship involves 
at least three letters A possible identity relationship for such a design contains 
the numbers of the Abelian group obtained from all combinations of the elements 
1, ABC, CDE, EFG, GHIC, ADL, and AFH, with the rule that the square of each 
letter is to be equated to unity. Each possible comparison may then be due to 
any of the 64 effects or interactions which may be derived from this identity 
relationship. In other words, each comparison has 64 aliases: in the case of ten 
of the comparisons, only one of the aliases is a main effect, and for the remaining 
five comparisons the aliases are all interactions of at least two factors. The 
actual design may bo written down by finding combinations of the letters which 
have the same number of letters in common with the unit element and the six 
three-factor interactions. These are themselves a group consisting of all com¬ 
binations of unity and four combinations of letters. The sixteen combinations 
with an even number of letters in common with all the members of the identity 
group are found to be the following: 


(1) 

abdef, 

acefl, 

bcdl, 

abfgld, 

degkl, 

bcegk, 

acdfgk, 

fgh, 

abdegh, 

aceghl, 

bcdefghl, 

ablild, 

defhkl, 

bcefhk, 

aedhk 


The estimation of effects from the results of the sixteen weighings is particu¬ 
larly easy, the weight of object a will be one-eighth of the difference between 
those weighings containing a and those not containing a. There are ten such 
contrasts which estimate the effects, and the remaining five contrasts may be 
used to obtain an estimate of the experimental error. If o- 2 is the variance of 
each weighing, the variance of the weight of a, that is, the effect A will be (1/8 + 
l/8)<r 2 — (l/4)o- 2 . The precision can be increased fourfold in the weighing prob¬ 
lem with a chemical balance by interpreting the absence of each letter as the 
placing of the object in the left hand pan and the presence as the placing of the 
object in the right hand pan. Each'effect will then measure twice the weight 
of the corresponding object and the estimated weight of each object will have a 
variance of d 2 /16, that is, the same precision as if each had been weighed by itself 
eight times in each pan, or sixteen times in all. 

7. General case. The rules by which fractional designs may be constructed 
have been exemplified above and the procedure is simple, though laborious in the 
case of a large number of objects. It does not, therefore, seem worth while to 
enumerate particular designs for the weighing of particular numbers of objects. 
A general procedure in considering the design for a particular problem is as 
follows. Taking the case of a number n of objects, the experimenter should 
form a rough idea of the order of magnitude of the experimental error, a say, and 
decide what accuracy he requires for his estimates of the weights, a standard 
error say of s. Then if he weighs 2 P combinations of the objects, the standard 
error of the estimate of each weight will be 2in the case of the chemical 
balance This serves to determine 2 p/2 and therefore p, and it is then necessary 
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to design a 2" experiment of fraction Alternatively, a design of higher 

fraction which can provide estimates may be replicated the corresponding number 
of times. In the case of the spring balance the corresponding standard error is 
2 Cp— i) /s ff necessitating a design of higher fraction. 

Designs of the type described above have some useful properties: 

(1) the design automatically takes care of any bias in the balance, 

(2) the effects or weights may be computed easily as indicated above, 

(3) the effects or weights are uncorrelated, 

(4) all the effects are measured with the same precision, and 

(5) an estimate of the experimental error which is independent of the effects 
may be computed from the results 

In considering the use for a particular problem of a design like the one discussed, 
it is important to understand completely the structure of the design. Such 
designs may well have real value for the weighing problem, but it is not easy to 
visualize other problems for which they would not give results capable of various 
interpretations. The use of the above designs depends on an assumption that 
interactions between pairs of factors are negligible, and this is generally not the 
case, for example, in biological research work, in which the experimenter may well 
be confident that interactions between three or more factors are negligible. In 
the particular case described in detail, there are only fifteen independent com¬ 
parisons between the sixteen results which will be obtained, and it follows from 
the identity relationship that the comparison giving the effect A, also measures 
the two factor interactions BC, DL, and FH. If therefore the factor a has no 
effect and there is an interaction between factors b and c or the other two pairs 
of factors, the experimenter will conclude that the factor a has an effect. It is 
clear that under these circumstances the experimental results are worthless. 


8. Efficiency of designs. In [2], Mood states for optimum designs such that 
when N weighings are made, the variance of the estimates of the weights are of 

(j * 2<r 

the order of ' n the case of the chemical balance and in the case of the 

spring balance, where a is the variance of a single weighing. As indicated above, 
when N is a power of 2, the fractional factorial designs result in the same 
variances. These designs are similar to those proposed by Kishen [6], 

Mood dealt with the case in which the number of weighings was restricted, 
for example to be equal to the number of objects, and defined a best design as that 
which gave the smallest confidence region in the p-dimensional space for the 
estimates of the p-weights. 

To take an example for the weighings of 3 objects with a spring balance with 
no bias he suggests the following design: 

A i o\ 
x = 11 o 1 


\0 1 1 / 


where the rows of the array refer to weighing operations and the columns refer 
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to objects. If the results of the weighings are yi , y 2 , and y 3 respectively, the 
estimates of the weights h , b 2 , and b 3 are given by the equation 

M / 1/2 1/2 —1/2\ A/A 

U] - I 1/2 -1/2 1/2 L 
W \—1/2 1/2 1/2/ W. 

If o- 2 is the variance of a single weighing, then the variance of each estimate will 
be [(1/2) 2 -H (1/2) 2 + ( — l/2) 2 ]cr 2 = (3/4)o- 2 : or if N (= 0(mod 3)) weighings are 
made by replicating the above system 2V/3 times, the variance of each estimate 
will be 9cr 2 /4N. The covariance of any two estimates is (— l/4)o- 2 so that the 
square of the correlation between any two estimates is —1/9. The fractional 
factorial design will yield estimates which have a somewhat higher variance, 
namely 4 o 2 /N. This increase in precision obtained in Mood’s design has been 
obtained at the expense of obtaining correlated estimates which in addition are 
subject to any bias which the measuring instrument may have. It is doubted 
whether the use of such designs for any practical problem can be justified. 

It is of interest to note that the concept of fractional replication may be ex¬ 
tended to give designs requiring a number of weighings other than a power or 
two. Ijjor the weighing of 3 objects for example, a factorial design of fraction 
3/4 could be used: it could consist of a half-replicate based on the identity I = 
ABC, and a quarter replicate based on the identity 

I = A ~ BC - ABC. 

There is, however, little point in discussing such designs for the weighing problem, 
as their efficiency as measured by the total number of weighings needed to achieve 
a particular degree of accuracy is lower than for the designs described in this 
paper. 
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MULTIPLE SAMPLING FOR VARIABLES 

By Jack Silber 
Roosevelt College 

Summary. A multiple (sequential) sampling scheme designed to test certain 
hypothesis is discussed with the following assumption: X is a random variable 
with density function P(x ) which is piecewise continuous and differentiable at 
its points of continuity. Formulae are derived for the probability of acceptance 
and rejection of the hypothesis and for the expected number of samples necessary 
for reaching a decision. These formulae are found to depend on the solution of 
a Fredholm Integral equation. Explicit solutions to the problem are obtained 
for the case when P(x) is rectangular by reducing the fundamental integral 
equation to a set of differential-difference equations. Several examples are 
given. 

1. Introduction. A multiple sampling scheme is here proposed which is 
based on cumulative sums of random variables. Bartley [1] has developed a 
theory of multiple sampling for attributes when the attribute can take only two 
values with probability (p) and (1 — p) respectively. Formulae are there de¬ 
rived for the probabilities of acceptance and rejection of the null hypothesis and 
for the expected amount of sampling necessary for reaching a decision. In 
this paper the same type of formulae are developed for the case of variable samp¬ 
ling where the underlying probability law for the variable is given by a piecewise 
continuous function for which derivatives exist at its points of continuity. 

The whole theory of multiple sampling is closely related to Wald’s [2] theory 
of sequential tests. The fundamental difference is that in the latter, probabil¬ 
ities of errors of the first and second lands are assigned, and acceptance and 
rejection criteria derived therefrom, while in the former the problem is solved in 
reverse order. There the acceptance and rejection criteria are assigned, and 
probabilities of eventual acceptance and rejection derived. For different par am¬ 
meter values, these are the probabilities of making errors of the first and second 
kinds. 

The problem presented here is similar to that given by Wald [3] in bis paper 
on cumulative sums In the present paper we waive the restriction that the 
expected number of items necessary for termination of the cumulating process 
be given explicitly as an integer. Since the theory given here is from the point 
of view of multiple sampling, the language appropriate to that theory will be 
used. 

2. The sampling scheme. Let X be a random variable with probability 
density function P(x ) which is piecewise continuous. One variate, say Xi , 
is selected and if mi > b, the hypothesis (for example the null hypothesis with 
respect to the mean) is accepted, and if < a, the hypothesis is rejected. If, 
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however, a < Xi < 6, another variate Xu is selected In the latter case similar 
criteria with respect to Xi + determine whether the hypothesis is to be accepted 
or this method of sampling continued. Or more formally, let 

r 

S T = Xi (v = 1, 2, 3, ‘ • 1 ), 

1=1 

where the cumulative sums S r are formed sequentially as follows for any integer 
r the cumulating process is terminated by acceptance of the hypothesis if S r > b 
and rejection if S r < a, but, if a < S, < b another variate a; r+1 is selected and the 
sum (S'r+i formed. The acceptance and rejection criteria are then applied as 
above. No attempt is made here to indicate the choice of the acceptance and 
rejection criteria. 


3. The probability of acceptance. If at the rtli unit the hypothesis is neither 
accepted nor rejected, then it must be true that a< S r <b. Let us denote the 
probability that this condition holds by 


(3.1) 


Yr(Sr) dS r , 


where Y r (S r ) is the probability density function for S r in the above described 
sampling scheme The probability density function for would then be given 

by 

(3 2) 7r+l(6V + l) = f Y r (Sr)P(S r+1 - S r ) dSr . 

The probabilities of accepting or rejecting the hypothesis on the rth trial are 
respectively 

(3.3) f " YriSr) dSr , r Yr (Sr) dS r , 

* b J—n 

and therefore the probabilities for eventual acceptance or rejection are given 
by 

(3.4) I\ = Z f YAS r ) dS r , P R = E f Y r {Sr) dS r . 


The probability that a < S n < b cannot exceed the probability that a < T n = 
+ xt + *a + • • • 4- x n < b on a single sample of n variates, that is Pr (a < 
S n < b) < Pr (a < T n < b). For distributions with positive variance, it can 
be shown that the right member of the above inequality approaches zero as n —> 0. 
Therefore, the process of sampling as outlined above will eventually lead to 
acceptance or rejection of the hypothesis. See Wald [3, p. 284] for a directproof 
that the probability that the left member of the above inequality holds for 
n = 1, 2, 3, • • • is zero, 
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Consider the linear integral (Fredholm) equation 


(3.5) 


ft 

Y(x) = Fi(x) + X Pix - «)F(e) 


ds, 


where Yiix) = P(x) and assume a solution of the form 

(3.6) T(x) = F t (s) + XY,(x) + X 2 F 3 (x) + ■ • • . 

That solutions, in power series in X, of the Fredholm equation exist when the 
kernel P(x - s) and the function Fi(z) have finite discontinuities is well known 
and the theory has been expounded by several authors. (For example see 
Goursat [4].) If the power series m X is substituted in the integral equation we 
obtain 


Fi(s) + XFi(as) + X 2 F 3 (z) + 


(3.7) 


= Fi(ir) -b X J [Fi(s) + XFa(s) + X 2 Fa(s) + ■ ■ 'jPGr s) ds 
= Yyix) + X f Fi(«)P(x - s) ds + X 2 f Y 2 (s)P(x - s) ds 

da •'a 

+ X s f Y 3 (s)P(x -«)* + ■■• 


Equating coefficients of like powers of X we see that 
(3.8) F r (x) = f Y r -i(s)P(x -s)ds, (r - 2,3, ■ ■ ■). 

J a 

Tins, however, is the probability distribution for S r , r = 2, 3, • ■ • under our 
sampling scheme, and therefore from the equations, 

(3 9) Y(x) = £ X ,_1 Y r (x) = Y,{x) + X f P(x - s)F(s) ds, 

r=l J a 


we have that the probability of eventual acceptance for X = 


(3 10) 


£ Y r (S r ) ds T = Y{%) dx. 

r=1 l 


1, 


is 


Thus our problem of finding a formula for the probability of eventual acceptance 
oi rejection of the statistical hypothesis under the above sampling scheme re¬ 
duces to that of finding a solution of a linear integral equation. 

The argument in this section has, of course, been entirely formal. However 
from the general theory of integral equations we know that the series solution 
(3.6) converges uniformly for X < 1/M{b — a) where P{x) < M, since P(x) 
is a piobabilitv density function. In equations (3.4) and (3.10) we give solutions 
for X = l and of course we assume that M(b — a) < 1. Since (3.6) is uniformly 
convergent the interchanges of integration and summation in (3.10) and (4.3) 
in the following section are allowable. 
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4. The expected amount of sampling. Since 


(4.1) 


f Yr-^Sr-l) dSr^ 

*'a 


is the probability that the rth sample will be reached, then the probability that 
on the rth sample, the hypothesis will be either accepted or rejected becomes 

(4.2) f IVlGSr-O dSr-1 - f Yr(Sr) dS r , 

"a •'a 

that is, the first term in this expression gives the probability that no terminating 
decision is made on the (r — l)st sample and the second term gives the proba¬ 
bility that a like decision is made on the rth sample. The difference of the 
two then gives the probability that a terminating decision (acceptance or rejec¬ 
tion) will be made on the rth sample. The expected number of units sampled 
will therefore be 

E = J - f P(x) dx + £ r [ T n_i(<$U) dSr- 1 - [ 7 r (S r ) dSf r l 

Ja r=n 2 ^ a -I 

(4.3) = 1 + £ f Vr(Sr) dSr = 1+ f £ ?,(*) *0 

r»l r=l 

= 1 + f Y(x) dx. 

J a 

Thus, the amount of sampling expected before a terminating decision is reached 
also depends upon the solution of the integral equation. We proceed to discuss 
the problem when P(x) is given by a rectangular distribution. 


5. Reduction to differential equations when P(x) is rectangular, 
the integral equation ® 


(5.1) Y*(z) = P*(z) + X f P*(z ~ t)Y*(t) dt, 


Consider 


where 


(5.2) 


F-M - i, 


— c < z — a < +c, 


= 0, z - a > c or z — a < -c, 


and in the integral equation 

a+a — c<2<&-f-a! — c. 

The parameter a is restricted to the values — c < « < c for the following reasons 
The rejection criterion a cannot be greater than c + a for, if so, rejection will be 
automatic on the first sample. Similarly the acceptance criterion b must be 
greater than -c + a for otherwise, acceptance would be automatic on the first 
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trial. If a > c then, rejection can never take place if it does not take place on 
the first trial for in this case all 2 > 0, Similarly, if a < c then, acceptance 
can never take place if it does not take place on the first trial for in this case all 
z < 0. Furthermore, in obtaining solutions of the integral equation, we will 
take a to be > 0 This inequality is no real restriction since solutions for nega¬ 
tive a can be obtained by symmetry from the solutions for positive a. 


If we let x = 2 — a then 


(5 3) 

Y*(x + a) = P*{x + 

a) + X P*(x + a - t)Y*(t) dt, 

or 



(5.4) 

y(s) = P(x) + X f P(x - t)Y*(t) dt, 

where 



(5.5) 

II 

, — c < x < + c; 


= 0, 

x < —c or x > +c. 

Now let s 

= t — a, then 



Y(x) = P(x) + X j 

f P(x — a - s)Y*(s + a) ds 

(5.6) 


a—a 

fb—ct 


J po —a 

P{x — a - s)Y(s) 

0—« 


ds. 


We have thus transformed our equation to one in which P(x) becomes sym¬ 
metrical with respect to the line x = 0. Furthermore, the probability of accept¬ 
ance becomes ® 


(5.7) 


P* = 



dx, 


and the expected amount of sampling becomes 


(5.8) 


E = 1 + f ‘ Y(x) dx 


Also, x now has the following range: a — c < x < b + c. If we now make the 
transformation x — a — s = y, then 


(5.9) 


Y(x) = P{x) + f P(y)Y(x — <x — y) dy, 

Jx—b 


and the following cases present themselves. 

If rb — (i < —c or x - b > +X, then F(x) s P(x), since P{y) = 0, 
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If X - b < 

—c < x — a < +c, then 

X p *—a 


(5 10) 

I'M-PM+| I. 

ns 

'3 

1 

<5 

1 

where 


a — c < re < a + c 

when b — a > 2c, 

(5.11) 


a — c<x<b — c 

when b — a < 2c. 


If * — b < — c < +c < x — a, then 
(5.12) Y(x) = PCs) + A £* Yb - a - y) dy, 

where 


(5.13) a+c<x<b—c and b — c > 2c. 

li—c<x — b<x — a< -\-c, then 

\ r x ~ a 

(514) Y(x) = P(x) + 7 T Y(x - a - y) dy, 

ZC Jx-b 

where 

(5.15) b —e < x < a +c and b — a < 2c. 

If -c < X — b < +c < x — a, then 

(5.16) Y{x) = P(x) + ~ r Y( - x ~ a ~ y ] d V' 

2c Jx-b 

where 


(5.17) 


b — c < x < b c when b — a > 2c, 

a + c<x<b + c when b — a < 2c. 

Transforming back to the variable s, we have for the case b - a > 

F(x) = P(x) + — / F(s) ds for a — c < x < a + c, 

ZC a 

x nx —a-fc 

(5.18) = P(x) + / F(s) ds for a + c < ® < b - c, 

2C •'x—a—c 

= P(t) + A f Y(s) ds for 6 — c < a 1 < 6 + c, 

2c Jx—at—o 


and for the case b — a < 2c, 


•. px — a+a 

F(.x) = P(:c) + jr- / F(s) ds for a — c<x<b — c, 

2 C "a—a 

(5.19) = P(.t) + A f F(s) ds for b — c < x < a + c, 

2C *a—a 

= P{x) + A f F(s) ds for a + c < x < b 4- c 

2C •'a— a—c 
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In all of the above equations, the integral is a continuous function of z, a, a , b, c 
while P(x) has a discontinuity at x = +e and x = — c, the jump at these points 
being of amount 1 /2c. The function Y(x) will therefore be such that 


(5 20) 


F(-c + 0) - F(-c - 0) = 1/2c, 
F(c - 0) - F(c + 0) = l/2c. 


If we now differentiate the above sets of integral equations with respect to % we 
obtain the following sets of differential-difference equations for the case X = 1. 
If fa — a > 2c, 


Y'(x) =±Y(x-a + c) 


for a — c < x < a + c, 


(5.21) 


= {F(:r — a + c) — F(r — a - c)} for a + c < x < b — c, 

2 c 


= — Y(x — a — c) for b — c < x < b + c, 

2c 

and, if fa — a < 2c, 

Y'(x) = 7(ai — a + c) for a — c < x < b — c, 

LiCi 


(5.22) = 0 


for b — c < x < a + c, 


= —^-Y(x- a-c) for a + c<x<b-\-b, 

Zc 

the derivatives holding for all points except at x = — c and x = -\-c. 

Although a technique has been devised to solve the above equations for finite 
a and fa, mathematical difficulties of a computational character are encountered 
when (fa — a) is made large. Note that there are only three essential parameters 
in the above problem since c can be taken as the unit of measurement In the 
technique illustrated by the following examples, a has been fixed as has (fa — a), 
i e. the solutions shown m the examples below are general only insofar as one 
parameter is concerned The essential feature of the technique is that the range 
of Y (s) has been further subdivided so as to make its points of discontinuity end 
points of subdivisions of its range, and thus Y{x) becomes continuous fronythe 
right or left in every subinterval of its range. 

6. Example I: fa — a = 2c, c = 1, a = 0. In this case x ranges from (a — 1) 
or (~c), whichever is smaller, to (fa + 1) or (+c), whichever is larger, If 
—c < a — 1, then F(a;) = P(x) for — c < x < a — I, and if fa + 1 < -fc, 
then F(j) = P(x) for fa + 1 < x < —c. For x between a — 1 and fa + 1 the 
domain of the differential-difference equations is divided as follows, where a 
is now restricted to the interval — 1 s a £ 0. 
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(6.1) Y((x) = £F, +2 (o: + 1) where for i = 1, a - 1 < x < -1, 

i = 2, — 1 < x < a, 

i = 3, a < a: < 0, 

i = 4, 0 < * < a + 1; 

Fl(z) = - ^F,_ 2 (x — 1) where for i = 5, o + 1 < * < +1, 

f = 6, +1 <x<q + 2, 

i — 7, a + 2 < x < +2, 

* = 8, +2<x<a+3. 

The above are the equations corresponding to (5.21) for the given example 
Differentiating the equations for i = 3, 4, 5, 6 and making certain obvious 
substitutions we obtain the following second order differential equations, 

(6.2) Y"(x) = -£F,(x), * = 3,4,5,6, 

where the domains for x are as in (6.1). If we solve the equations (6 2) and sub¬ 
stitute in the remaining equations in (6.1) we obtain the following set of equa¬ 
tions, 

F,(x) = A t+ 2 sin f(x + 1) - B, +2 cos %(x + 2) + Ii x , i ~ 1, 2, 

(6.3) Fi(x) = A( cos \x + B, sin $z, i - 3, 4, 5, 6, 

Y t (x) = —^4,-2 sin |(x - 1) -f B,_ 2 cos £(x - 1) + K,, i = 7, 8, 

where again the domains are as in (6.1) 

From continuity considerations we have the boundary conditions 

Fi(a - 1) = F s (a + 3) = 0, F 2 (-l) - ± = F a (-1), F*(a) = F,(a), 

F,(0) = F 4 (0), F.( a + 1) = F,(a + 1), F«(l) = F„(l) + i 

F.(a + 2) = F 7 (a + 2), F 7 (2) = F,(2). 

These boundary conditions yield certain relationships between the constants 
The equations so determined, however, do not form a consistent set of linear 
equations in the Ai , B,, K< • ■ • . If we integrate out the equations (5 18), 
sectionally, the following relationships between the constants are obtained. 

Ai = A 1+2 sin^ - B, +a cos^, B* = B t+2 cos \ + B i+l sin * = 3,4, 

Ki = -(Ai - At) sin $(a + 1) - (B s - Bi) cos l(a + 1) 

= j + Bk — Bj -(- K \, 

K-i = As - A t + 7C S , 1 K» = An sin §(a + 2) - B„ cos i(a + 2), 

(6.4) 

Bg = ^ + Bi + — At + A s -j- (A 4 — j4s) sin §(a + 1) 

+ (B 6 - Bi) cos i(a + 1), 

At = Aa + IU + (Ai - A 6 ) sin f (a + 1) + (B b - B 4 ) cos £(a + 1), 

Ki = -As sin £ + B s cos ^. 
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From these equations it is easily seen that At = A 3 and K\ = K 2 = Ki = Kg . 
Furthermore, the following set of consistent linear equations is obtained, after 
several simple manipulations and substitutions. 


sin \{a + 2) + sin - ■ sin A t 

U 


- jcos || B 3 + jcos l(a + 2) + sin ~ cos §j> = 0, 


(6.5) < - sin \(a + 2) + cos 4(u + 2) — cos ^ • sin £ > An 4- < sin - > B 3 


+ ssin |(a + 2) + cos 4 (a + 2) — cos ^ ■ cos j> Bo = 0, 


{cos 4} Ae — B 3 + {sin 4) B e = 0. 


All the other constants can be obtained from the solutions for A 6 , B 3 , B 6 in 

(6.5) Letting A equal the determinant of coefficients in (6.5) and using the 
relationships (6.4) we obtain the following solutions: 

A = 2 — 2 sin 4 — cos 4> 

AA< = 4(cos 4 - cos a/2 • sin 4 (a + 1)) = AA 3 , 

AB 4 = 4(sin 4 — sin a/2 • sin 4(a + 1) + cos 4 — 1} , 

AA 6 = 4(sin 1 — cos 4 + cos a/2 • cos 4(a + 2)}, 

(6.6) AB 0 = 4{si 11 4(<* + 2) cos a/2 — sin 4 — cos 1), 

AB 3 = 4(1 — sin a/2 • sin 4 (a 4- 1) — sin 41, 

AA 6 = 4(cos 4 - sin s 4 (a + 1)}, 

A B& = 4{sin 4 + sin 4(« + 1) • cos 4(o + 1) -1}, 

A7£i = 4 jcos | sin 4(n 4- 2)j = Alfa = A Ki = AA 8 . 

If we now integrate Y ( x ), equation (6,3) sectionally, i.e. from the left end point 
to the right end point of each sub-interval of its range and then add up appro¬ 
priate areas, we obtain the following formulae for the probabilities of acceptance 
and rejection and for the expected amount of sampling: 

Pr = ^ {cos 4(a 4- 1) 4- sin a/2 — cos a/2 4- AA 2 j, 

Pa = ^ {2 - cos 4 — 2 sin 4 4- sin 4(a + 1) 

— cos 4(a 4- 1) — sin a/2 4- A/C 2 }, 
E = ^ {cos a/2 — 2 sin a/2 — siri 4(a 4- 1)). 


(6.7) 
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7. Example II: a = 1, c = 3, b — a = 4. In this case, aa in the previous one, 
Y(x) = P(x ) for — 3<a;<o — 3 when a — c = a — 3 < —3 and if b + c = 
a + 7 < 3 then F(x) = P(a), o + 7 < a < 3. For a — 3 < x < a + 7 where 
a takes on only integral values between —5 and 3, we have the following set of 
differential-difference equations : 

Yi + ,(x) = |F a+J+s (x + 2), j = -3, -2, -1,0, 

(7.1) =0, j = 1,2, 

= ~ - 4), j = 3, 4, 5, 6. 


If we integrate the above equations for j = 1, 2, substitute in the equations for 
j = —1, 0, 5, 6, integrate, and then substitute in the remaining equations, we 
obtain the solutions 


(7.2) 


= rfr -4 .«+i+4(* + 2) 2 + , 

j = 

-3, - 

= ® "f" 

J = 

-1,0; 

= A a +, 

j = 

1,2; 

= — a(iC 4) 4 ® 4“ > 

j = 

3,4; 

= 'll “f“ -*4(1 + 7 , 

j = 

5, 6 


As in the previous example we now use (5 22). Integrating out (5.22) sec- 
tionally, certain relationships between the A a . hj , j - —3, — 2, • ■ ■ 6, are ob¬ 
tained. These yield 

ylo-t-i = rff (12P„_i + 12P„ -f 39P„+i + 9P a + 2 1, 

A u+2 = *{12P«_i + 12P a + llPa+i + 37 P„ +2 ), 


A 0 _! = - ~ {4P a _l + 4P a + 13Pa+l + 3P 
56 


0 + 2 ) 


(7.3) 


+ tw|228P 0 _ 1 + 60P„ + 55 P a+1 + 17Pa+ 2 ), 


Ac = - ^ {l2Pa-i + 12Pa + UPa+i + 37P a+2 ) 

luo 


+ TiE(60P o -i + 228 Pc + 55P a+1 + 17P s+2 ), 

where P a+ , is the value of P(x) for o + j < a; < a + j + 1, j = —3, • 6. 
All of the other constants can be found in terms of those given in equations 
(7.3). If we now integrate (7.2) sectionally and perform several simple manipu¬ 
lations, we arrive at the following formulas: 


-2 


P* = 2 P-+J + 9a 216 5 Aa+l + 


3a - 1 
216 


^4. a-f 2 ”1” ^ ^ ^ a— ^ 12 ^ ^ ? 


(7 4) Pa = Z P°+i + 3 ° 2l6 89 A a +i + 9a 216 1 - A„h2 + t^Ac-i + AAa, 


E = 1 + 2a — — A„4i + 2 ~ ] 2 ~~ A B+2 + A a+ i + A a 


12 


12 
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Although P^j , j = -6, -5j -4,7,8,9, have not appeared in previous deriva¬ 
tions in this example, they have been inserted in the above formulas to cover the 
cases in which a - c > -c or b + c < c. 

It should be mentioned that Kac [5] obtained the distribution of n (the ex¬ 
pected amount of sampling) by a process similar to that given in this paper, It 
is also interesting to note that the present paper could have been written entirely 
in the language of problems in Random Walk, 

The author has also worked on the case in which the distribution P(x) is tri¬ 
angular and parabolic. In these, as in the case of the rectangular distribution 
discussed in this paper for l - a large, the equivalent differential-difference 
equations are of large orders making the computation of solutions extremely 
tedious. As Kac [5] pointed out, the task of obtaining solutions in closed form 
for the case when P(x) is the normal law is extremely difficult, 
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ON THE CHARACTERISTIC FUNCTIONS OF THE DISTRIBUTIONS OF 
ESTIMATES OF VARIOUS DEVIATIONS IN SAMPLES 
FROM A NORMAL POPULATION 

By M. Kao 
Cornell University 


1. Summary. An explicit formula for the characteristic function of the 
deviation 

l -±\X K -X\“, a > 0, 

n i_i 

is derived for samples from a normal population. For a = 1 one can calculate 
the probability density function but the result does not seem to he in complete 
agreement with a recent formula of Goodwin [1], 


2. Introduction. Let X u • • ■ , X n be independent, normally distributed 
random variables each having mean 0 and variance 1. 

Let, as usual, 

__ Xi -j- -Xa ~b • 1 • b 
n 


and denote by F„(") the deviation 

(t) Yn(a) - -£ \Xk- *1°, “>° 

n i 

The purpose of this note is to show that 


( 2 ) 


Fn(£) = B[exp (t(F»(a))) 


1 r r“ 

= Vn ( V^)’* +1 L -L 


c a 


dx 


drj. 


It is easy to check that for a = 2 one obtains the well known expression 

^ _ h y n ~ m 

Moreover, if ct = 1 one can actually find the probability density of F n (l) The 
resulting expression is fairly complicated and it strongly resembles an expres¬ 
sion recently obtained by Goodwin [I], Except for the relatively simple case 
n = 3, it does not seem easy to verify that our formula is equivalent to that of 
Goodwin. 

Although deviations corresponding to values of a different from 1 and 2 are of 
little practical value the explicit formula (2) may be of some interest. It is 
also hoped that the method of derivation may prove useful in other cases. 

257 



258 


M. KAC 


3. The derivation of (2). We start with the observation that 

X and F„(a) 

are statistically independent (see e.g. Daly [2]). 

Denote by 

E*[\X\< e, exp (r£F n («))) 

the integral of exp (i£Y n (ot)) extended over that portion of the sample space in 
which | X | < e. Thus the conditional expectation exp F„(a)) | | X | < «} 
is given by the formula 


£(exp &Z„(«)) \ \X\ <t] = <’exP _ 

Prob { | X | < e} 

Because of the independence of X and F n (a) we have 

(3) E{ex p (i?F n («))) = I X I < «> e xp (iff^aj))j . 

Prob { | X | < e ) 

For the sake of simplicity we assume now that a > 1 and note that 
exp (ifF„(«)) - exp £ \ X k |“) | < 1 £ | | X k |“ - | X k - X |"| 

< fi ( I X, | + | X | ) 

Thus, on the portion of the sample space where | X | < e, we have 

exp (iZYM) - exp £ \ X K |“) < ^- € £ ( | Z fc | + e)*" 1 
\n i / n i 

and consequently 

E *\ I X | < e, exp (t£F„(«))) - E* | | X | < e, exp £ | Z* |“^ 

^ ~ E* { | X | < «, ± ( | X* | + *)° _1 \. 


Clearly E* | | X | < «, £ (| Z* | e) a 1 j , approaches 0, as e approaches 0, 
hence by (3) 


(4) E {exp (i£F n (a))} = lim 


E*l | Z | < e, exp £ | Zf* ^ 
= lim -L_vh i / 


Prob { I Z I < e { 



CHARACTERISTIC FUNCTIONS 


259 


Using the fact that 


we obtain easily 

E* { | X 
(5) 1 


= 1, 

| X | < «, 

- / — exp (i v X) d. v = 4, 

7T J— oc 7] 

1 x | = «, 

= 0 , 

| X I > 

< e , exp(jj E | X* |“)| 



= - f X — r ’ eI exp 1 £ (J I X, 1“ + nXO 
ir J- K ij ( n i 


dr,. 


The justification of interchanging of the order of integration (from - <* to ») and ' 
the operation E can he made quite simply (see e.g. Kac and Stemhaus [3]). 

Notice now that 


E | exp l £ ({ | Xj. + r, X k ) j 

= L exp (" l) exp l ( * 1 * r + vx) ^ = ^ v) 

and that <?„(£, rj) is absolutely integrable m (— », ») as a function of ij. 

Thus, as e —» 0, 

(6) - f <p n ((, 7j) dy ~ - f <?„(£, rj) di). 

7T J-x> 7} 7T J-to 

Furthermore (as « —> 0) 

(7) Prob { | X | < e } ~ 2e 

V 27T 

and combining this with (6), (5) and (4) we get 

(8) E{exp (iijY„(a ))} = ^7= <?„(£, 17) dr,. 


This, of course, is equivalent to (2). 

4. Density function of the mean deviation. If a = 1 one can obtain an ex¬ 
pression for the probability density/„(/9) of 7 n (a). We note first that 


L “ p (- 0 


exp - (f | x | + t]x) dx 


= n jf exp exp i(j: + i,)x dx 

+ n J exp ^ exp i(| — r,)x dx = n\R(^ + rj) -f- R{% — 17)) 
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where 


dij 


R(u ) = J exp ~ 2 ~^ exp (ira) d:r. 

Using (2) (with a = 1) we obtain 

W{) - v57sr‘ £ [S CO K ‘« + '»■'*« - ">' 

Let us first look at the summands corresponding to k = 0 and k = n We have ‘ 
f R\t ~ v) dn = f R n (r,) dn = [ K RT(( + ,) d„. 

J— OO j— 00 «*— 00 

Now, is the Fourier transform of 

f 0, x < 0, 

«’> = ( Mp (_"), *>„, 
and hence R n (t,) is the Fourier transform of the convolution 

f * f * • • * f = 


It is easily seen (using integration by parts) that 

m = 0 (m) 

for large | r, | and hence for n > 2, R"(r,) is absolutely integrable in (— co , co). 
It follows (classical inversion formula) that 


[ R n ( v ' dr, = 27rf W (0) 

V—03 

• Since for n > 2 , £ n) (x) is continuous and £ n) (x) = 0 for x <0 we 
have £"’(0) = 0. Thus 

re—1 / \ j' « 

m = (V^T 1 £ (fc ) jL k ‘ (€ + ^ *»• 

It is fairly easy to check that 

j [" R k (t + v)R n ~ h (t - v) dr, = 7T J exp (%)£ b) £"~ w da; 

so that 
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Finally, 


IM 


7rtl " -1 



Ah) 


( 

\2/ i \2) 


I have not been able, except for n = 3, to verify directly that this formula is 
identical with that of Goodwin. 
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NOTES 

This section is devoted to brief research and expository articles and other short Hems. 


A FUNCTIONAL EQUATION FOR WISHART’S DISTRIBUTION 

By G. Rasch 

State Serum Institute and Univetsity of Copenhagen 

1. Introduction. The sampling distribution of the moment matrix for ob¬ 
servations from a multivariate normal distribution was given by Wishart in 
1928 [1]. This proof involved rather advanced multidimensional geometry but 
since then two analytical proofs have been given: one by Wishart and Bartlett 
in cooperation with Ingham by the use of the characteristic function [2] and a 
second by Hsu by induction with regard to the dimension of the observa¬ 
tions, [3], 

In the following section is given a new derivation of the form of Wishart’s dis¬ 
tribution in which a fundamental property of the multivariate normal distribu¬ 
tion is utilized, viz. the invariance of the distribution type against a linear trans¬ 
formation. In section 3 the same principle is used for evaluation of the constant 
and determination of the moment matrix in the multidimensional normal 
distribution. 


2. Derivation of Wishart’s distribution. Let 1 

(1) i = (xi , • • • , a;*), 

denote a fc-dimensional normal variate with the mean vector 0 and the distribu¬ 
tion matrix 

( 2 ) * = W, 

viz. 


(3) 


pfx) 


= VAQg) _ 

(V2 *y 


ft is symmetrical and positive definite. 

Now consider n observations of x: Xi, • • , x„ , which are stochastically in¬ 
dependent. Their joint distribution is 



The estimation of $ is based upon the moment sums 


__ niij — ,, 

> Notations. Lower case latin and greek letters are scalars, boldface capital latin and 
greek letters denote matrices, and boldface lower case letters row vectors. * means trans¬ 
position A (A; stands for the determinant of the square matrix A. 
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which form the symmetrical, positive definite matrix 

(5) M = (m„) = 2x*x,,. 

In order to derive the distribution of M the straightforward procedure seems to 
be to transform the distribution of the sample (xi, ■ ■ , x„) to a distribution of 
M and some other variables which then should be integrated away. As such, 
the transformation, 

(6) x, = u„M 4 , M2u*u,, = 1, 

might serve. The matrix 



contains nL elements linked together with — - relations; (U) symbolizes 


n — 


k T J 


k of the elements taken as independent variables 


For the purpose of introducing M in the exponential term m (4) we shall 
define the ‘‘double dot multiplication” of two matrices: 


( 8 ) 


A ■ B = ( a „) 


(K) = DE a.A,. 

(*) O) 


for which we notice the rule 


(9) A • ■ (BCD) = C • • (B*AD*). 

As obviously 

x^x* = 2= 4? • • (x*x), 


we have 

( 10 ) 

and accordingly 
(11) p(M, (U)j 


2x„4>x* = 4 • • M, 


/v my 

M 

a(xi, • , x„) 

\W^) 7 

c 

m, m 


where denotes the jacobian of the transformation. On mtegrating with 

3 ( ) 

respect to (U) we obtain 

(12) p(M) = (\/A(¥))” ■ e" !t " M ■ r(M), 

where <a(M) is independent of 4?. From this it follows that p{xt, ■ ■ • , x„ | M} 
is independent of $, i.e. M is a sufficient statistic for 4>. 

In order to determine the mathematical form of <p(M) we shall apply an arbi¬ 
trary linear transformation to the original variates: 

(13) x r = x'A. 
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The new variates x( are obviously normally distributed about 0 with the dis¬ 
tribution matrix 

(14) = A$A*. 

Therefore the distribution function of the new moment matrix, given by 

(15) M = A*M'A, 


is 


(16) 


p{M'} = (VA«t')) n ■ « M V(M'). 


On the other hand the transformation from M to M' is a linear one, the jacobian 
of which therefore is a constant depending on A only: 


(17) 


3(M) 

3(M') 


= \p{ A), say. 


Consequently, 

(18) p{M'} = VW) ■ M • v(M) • |*(A) 
The two expressions for p{M') must be identical, and as 

(19) A(*') = A(<h)A 2 (A), 


and 


(20) •• M' = (A4-A*) • ■ M' = (A*M'A) • • = M • • 'I>, 

it follows that v>( M) satisfies the functional equation 

(21) I A(A) | V(M') = *(M) • | '/'(A) |. 

Now, since the transformation M = (AB)* M'(AB) may be earned out in two 
steps, iZ'(A) also satisfies a functional equation 

(22) iKAB) = *(A)*(B). 

Furthermore, if A is a diagonal matrix it is easily seen that 

(23) *(A) = (A(A))* +1 , 

and this relation holds generally. In fact, considering the case where the normal 
form of A is a diagonal matrix: 

A = TDT -1 , say, 

we get 

*(A) = ^(T^(D)^(T- 1 ) 

= (A(D))* +1 f (TT _1 ) 

= (A(A))* +1 , 

and by analytical continuation this is seen to be true for any A. 
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Now, inserting this result in the functional equation (21) and taking for A the 
real symmetrical square root of M so that 2 M' = 1, we readily obtain the 
solution 

(24) ' <p(M) = (A(M i )) n_i_1 *(1). 

It follows that 

(26) p{M) = 7iM(A($j) n/2 • • (A(M)) ( "- ft - 1 » 2 , 

where 7 ki n ) = <p( 1) is a constant which may be determined in various ways (cf. 
for instance Cramdr [4]). 

3. Other applications of the linear transformation. It may be noticed that 
the linear transformation also leads to simple derivations of two fundamental 
properties of the normal multivariate distribution itself, viz. determination of 
the constant and the relation between the moment matrix and the distribution 
matrix. 

Let 

(26) P{xj = y($) • 
and transform by 

(27) x - x'A. 

The new variable obviously has the distribution matrix (14) and the constant 
7 (<&'). But on the other hand direct transformation of (26) leads to 

P{x'l = 7 (*) • e~ ixW • 

= y($) I A(A) | 

and therefore we must have 

7<*') = ?(&) I A(A) 1 • 

For A = 4>~ 4 we get 4' = 1 and consequently 

y(*) = VA($) * y(l)» 

where obviously 

7(1) = (vfef- 

Considering 

M(4>) = J x*xp{X} dx, 

s Exists because M is positive definite. Let M = ODO* where O is orthogonal and D 
the diagonal form of M, then. Mi = 075*0* is real and symmetrical 
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the stale transformation gives 

M(f) = J A*x*xAp[x'} dx', 

= A*M(4>')A 

which for A = <f> -1 leaves us with 

M(3) = (4>') _1 * 

because M(l) = 1. 
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THE DISTRIBUTION OF A DEFINITE QUADRATIC FORM 

By Herbert Robbins 
University of North Carolina 

Let Ji, • • • , X n be independent normal variates with zero means and unit 
variances, let ai, ■ • • , a„ be positive constants and define 


( 1 ) 


-!*! + ■■■ + f x 2 *, 

W = Pr [U n < x], f n {x) = F'„{x). 


( 2 ) 

Setting 

(3) a = (ai ■ • • off" 

and using the convolution formula we may show by induction that for x > 0, 

( 4 ) 


(5) 


h-o r(tn -|- fc) 

F n (x) = a-'V" ± —. 

ts r+ fc + l) ’ 


where for fc = 0,1, • ■ • 

Ck = T _!n Z . ^ + ^"- i r( ^ + i ) >Q 

*1+ 'Z'l I * * * • • • (Xn 


( 6 ) 
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In particular, if ai = • • • = a n = 2, then using the known distribution of x with 
n degrees of freedom we have 


m = 


X ln-l e -ix 

2 in r(^n) 


X 1 "- 1 {-x) K _ X 4 "- 1 ^ c h (~x) k 

2i"rQn) fTo 2*/cl fro r(§re + /c) ’ 


so that 

_ T(in It) 7r y' Pfe a) 1 ■ • r (in 4- h) 

k ~ 2* k ir( 4 «) 2* , 1+ tii...f„' 

and therefore 

, . y> T(ii + $) ■ • ‘Tdn 4~ I) _ f 4 " r(fw + Zc) 

{) ■ it I ■■■*.! *ir(i») ■ 


Now in the general case let 


(8) 


a. = min {ai, • • • , a»}; 


t.lion from (6) and (7) we deduce that 


(9) 


c, c (—x) k 

r(Jn + k) 


< w* 

~ r(^w)fci ’ 


with a similar inequality for the general term of (5). 

It is difficult to evaluate numerically the coefficient c*. by a direct application 
of the definition (6). We shall therefore give a method by which the c k may be 
computed easily. We shall assume in what follows that the a, are distinct. 

let Yt , ■ • ■ , 7* be another set of variates with the same joint distribution 
as the Xi and independent of the Xt, and set 


( 10 ) 

( 11 ) 


Hi -v-2 I 
2 Xl + 


T7-2 r y2 


+i^ + 2 


+ 2 rni 


(? 2 „(x) = Pr [F 2 „ < as], ft»(*) = <&(*). 


Then by the convolution formula, 

J a, » I" * ' 

' /»(* - y)!M dy = a~ n x n ~' 2 < 2 
0 , 


(-»)» 
r(fc 4- n) ’ 


But we may show directly that, setting 

d3) g, = n (% - 

wo have 

(14) g tn (x) = (—l) n_1 2 ft«r* = (- D "" 1 g{ 2 ft«T 

4™! ,a3 * 1 


(i — 1* ' * ’ j 
(-*)* 


—K—2 


k\ 


Equating coefficients in (12) and (14) we derive the fundamental formula 

(15) t c<c^ = aT± W* +1) (A? s= 0, l f • • •)*■ 
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Thus, defining 

(16) 2P k = a" E g t a7 ( * +1) , 

1=1 

we may write 

(17) E C.CI-. = 2Z J *. 

i-0 


From (6) or (17) it follows that 


(18) c 0 = I- 

Thus we may solve (17) successively for the c k in terms of the P k : for j = 0,1, 


(19) 


C2j = ft; “ { Cl C2 7 -l + C2 C2j_2 + 1 * • 

C2)+l = P 2)-j-l — {filC2,' + O 2 C 2 ,—1 + 


+ C,-l C, + l + g 

‘ - + c i c J+l} • 


When the n constants qi, • ■ , g„ have been computed we may compute the 
Pa, by (16) and then the c k by (19) successively as far as desired. The inequality 
(9) gives an indication of the number of terms of the alternating series (4) or (5) 
which are necessary to secure a desired accuracy. A sharper result than (9) 
should certainly be possible when the a, are well separated. 

The foregoing method may be modified to cover cases in which some of the 
a* are equal, the formulas (16) being replaced by the appropriate limits as the 
a< approach equality. 

We shall now derive an expansion of f n (x) and F„(x) in terms of x 2 distribu¬ 
tions. Let us set for x > 0, 


( 20 ) 

or, equivalently, 


Into = E(-l )*d*. 


k -0 


a in+k r (|n + /c) ’ 


x in+k ~ l e~ ix 


( 21 ) 


2 /n (2 a ) S ( dk 2*«+*r(i« + k) 


^ Qx) l r(|n) 

2‘»r(4n) v u 01 r(|n + A) 1 


where the d k are to be determined. It follows, after some reduction, that 


(22) g 2n (x) = <T* a”- 1 e~ xla E \ E d { 

k= 

But we may write (14) in the form 


k 

. E 

=0 l i-D 


(“*)‘ 


(23) g tn (x) = a~ n x n ~\ ~* la E E »g, ar 2_4 (a - a,) 4 


a k r(|n + k) 

(-x) K ~ n+1 


a *-M+ip(fc q_ 1) . 
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Equating coefficients in (22) and (23) and seLting 

(24) 2( h = a T <h «7 (l+ " (a - a,) n+k ~ v , 

1-1 

we obtain the relations, do = J and 

(25) Zcld^ = 2Q K (k = 0, 1, ■ •), 


from which the di. may be computed as in (19) Equation (20) or (21) then 
gives the expansion of fjx) in a series of x frequency functions. The corre¬ 
sponding expansion of F n (x) is then 


(26) 


F,(. r) 


OQ t\X 

« £ (-iy d K f 

K^.0 JO 


—tf a 

LG 

rfn+k r (l„ + l) 


or 

(27) 



co A x 

= L(-1 fd k f 

JU=*0 Jo 


fln+t- 1 g-l/a 

2hi«r(|?r+T) 


df. 


By direct companson of (4) and (20) we may establish the following relations 
among the c/. and (4: 


(28) 



These may be used if both the power series and the x series are desired. 
From (6) we see directly that 


(29) 


Cl 


1 71 


and from (28) it follows that 

(30) di = \ mb i, 
where we have set 

( 31 ) 1>1 = | i £ nT 1 1 - (oi • • • «n) _1/, ‘. 

Since by a well known inequality in > 0 it follows that <4. > 0, with equality only 
if all the a, are equal. If we denote by h n {x) the frequency function of 
|a(4’i +■■■-)- X\) then 
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and hence (21) becomes 

« 'sb- 1 -*"*"" 

which is significant for x near 0. 


EXACT LOWER MOMENTS OF ORDER STATISTICS IN SMALL SAMPLES 
FROM A NORMAL DISTRIBUTION 

By Howard L. Jones 
Illinois Bell Telephone Company 

1, Summary. Exact means in samples of size < 3, and exact second moments 
and product-moments in samples of size < 4, are given in Table 1 in terms of 
it for order statistics selected from the normal distribution JV(0, 1). The deriva¬ 
tion employs multiple integration and some general properties of the moments. 

TABLE 1 


Expected values of lower moments of order statistics, > Xi+i, 
in samples of size n from the normal distribution N(0, 1). 


Moment 

n s= 2 

n = 3 

n = 4 

flfaj 

1/Vx 

3/(2aA) 


E[x>] 


0 


E[xl) 

E[x\\ 

1 

1 + *\/3/(27t) 

1 4- s/Z/ir 


1 — 's/s/V 

1 - V3/x 

E[x x x J 

0 

V3/(2tt) 

\/3/x 

E\%ix*\ 


— V3/V 

-(2V3 - 3)/r 

E\xiXi] 



-~Z/ir 

E[xix a ] 



(2 V3 - 3)/tt 

2 

CTl 

1 — 1/7T 

1 - (9 - 2\/3)/(4:7r) 


2 

<7 2 


1 “ 's/Z/ir 


on 

1/ir 

V3/(2x) 


0*13 


(9 - 4V3)/(4tt) 



2. Introduction. The usefulness of the lower moments of order statistics for 
determining the moments of the range and for other purposes is well established. 
In small samples, however, computation of the moments by quadrature is labori¬ 
ous [1]. The values shown in Table 1 should therefore be helpful in problems 
requiring the use of these moments for samples of size < 4, since the constant ir 
has been evaluated to several hundred decimal places. Some of the methods 
used to obtain these results may also be useful in approximating or verifying 
the moments in larger samples. 
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3. Multiple integration. Let n random selections from the normal distribu¬ 
tion N(0, 1) be arranged in order of size so that 

xi> x 2 > >x n . 

Tor samples of size 2, the means and product-moment are easily obtained from 
the general formula 


m eo rt oq 

x\x) 

! n-l 

P ) -• • f(x n ) dx i dx 2 ■■■ d:c„ 


/(X,) ~ V2z- e ’ 

E [x[\ being the special case where h = 0. Multiple integration can also be 
used to find any product-moment, E for samples of size 3, the order of 

integration being changed at any stage where necessary. 

For the means in samples of size 3 and the product-moments in samples of 
size 4, the integrals reduce to double integrals which can be evaluated from the 
equation 


f f dtidU = ~. 

2 ab 


This equation follows from the fact that 


is equivalent to 


while the function 


r r ab dk dh 

^bt^fa IT 

n dpi dpt , 

_P2 

m = f b2tl f <r s<i dt i 

•it. 


has the symmetrical property that <f> {fa) — whence 

f 4>(k) dh = 0, 

J-a> 

4. Some properties of the moments. The most obvious property of the 
moments of order statistics in samples from the normal distribution iV(0, 1) is 
their symmetry; thus: 

E [#t] ~ E [x n — i+l]j 

E [$»] = E [Zn-i+l]) 

E [x,x,] = E [%n-i+lX„-i + l]. 
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When sample values from any parent distribution aie numbered m order of 
random selection, x t and Xj^, are statistically independent of each other, and 
the expected value of a product xW, is the product of the expected values of x < 
and x'\ Numbering in order of size has the effect of increasing some expected 
values and decreasing others, leaving the sum of expected values of a given type 
unchanged, so that in general, 


71 — 1 

Z 


i-l 


Z 0^4) 

i—*+1 



e\A\eM\ 


where x 0 is a random selection, In particular, this equation holds for the special 
cases (7c = 1, h = 1), (7c = 1, h = 0), and (7c = 2, 7} = 0), so that in samples 
from the normal distribution 7V(0, I), 

Z Z C/SfoXj]) = ln(n — 1)(/M 2 = 0, 

lOEll ]—t+l 

Z (M = nE[x 0 \ = 0, 


Z = nE[xl] = n. 

i=»l 

The foregoing relationships lead immediately to the evaluation of E [jii*] and 
E [4 m samples of size 2. (The generalization of these relationships was sug¬ 
gested by Professor John Ii. Smith, whose unpublished manuscript on sampling 
from a rectangular distribution has also been instructive.) 

In samples from a normal distribution, the covariance of every order statistic 
with the sample mean is the same as the variance of the sample mean. This 
implies that the variance of the sample mean < the variance of any order sta¬ 
tistic, the ratio of one standard deviation to the other being equal to the co¬ 
efficient of correlation between the sample mean and the order statistic. To 
derive these properties, consider the linear function 

Vfl = WlXl "p W2X2 -p • • ’ ~p W n X n 

of the order statistics Xi , X 3 , ■ ■ ■ , X„ in a sample selected from the normal 
distribution N(n, a) with unknown \j and a. Let 

X\ = (Xi ]x)j(F) % — 1, 2, ’ *' , a. 

The conditions wi + w% -p ■ • ■ + w„ = 1 and = W{ are sufficient to make 

m an unbiased estimate of y. with variance E -p WtiXt -p • • ■ + tt>n*n) ]■ 
The id’s that make this variance minimum must satisfy the equations obtained 
by replacing w , with w n -,+i , for i > + 1), in the expression 

■E[(wi£i + w 3 x 3 ,-p • • • -p w n x n ) 2 ] + k(wi -p Wi -p ■ ■ • -p w n — 1) 

and then setting the partial derivative with respect to each w equal to zero. 
This leads to 


Z % E[x, x,] + Z w,E[x nr - i+ ix 3 ] -p X = 0, 1 < i <n, 


3-1 
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where the summations mclude the terms E[x\] and E[x n -i+i], respectively. But 
it is known [2] that the sample mean is the regular unbiased estimate of g with 
minimum variance. Setting each w equal to 1 fn and combining equivalent 
terms yields 

n 

^2 -fc'Li-i :r,] + JnX = 0, i = 1, 2, • • , n. 

i—i 


Summing from i = 1 to i = n, and employing the relationships discussed in the 
preceding paragraph, we obtain 

n + in\ = 0, 

whence 


and 


X = -2 /n, 

n 

A [;r, t’;] = 1, i = 1, 2 , , in, 

)-i 


where the summation includes the term E[x\\. This equation leads to the prop¬ 
erties mentioned at the beginning of this paragraph. The same equation can 
be used to evaluate E[x\\ and E[x\\ in samples of size 3 or 4 from the distribution 
iV(0,1), after the product-moments have been found. 
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NOTE ON AN ASYMPTOTIC EXPANSION OF THE nTH DIFFERENCE 

OF ZERO 

By L. 0. IIsu 

National Tsiny-Huu University, Pciping, China 

This note gives an asymptotic expansion of the nth difference of zero. It is 
known that the Stirling number of the second kind is defined by 

(1) n'X H .. = A"0* = 

IraiO \^V 

We shall first show' that the Stirling number iS n ,*+i can be expanded in the 
form 
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- where /i /i are polynomials in k and whose coefficients can be found 

by means of the following lemmas. 

The first lemma is due to B. F. Kimball, [1, (5 3)] 

Lemma 1. (Kimball) Let q be a real number such that n + ? > 0, and let 
f(x) = *" + ®. Then we can write A“/(-e) m the form 

(3) A7M - /'■’(* + in) [l + t nm ,»)]. 

where the value of W (m, n) is given by 

(4) W(m,n) = BTm(—$n)/(in) 2m , 


B y n (x) being a so-called Bernoulli 'polynomial of negative order which was first 
defined by Norlund [2]. 

Lemma 2. Let the sum of all 

the set (1,2, • ■ • , n) be denoted by Skin). Then we can express it in the form 

(5) S k (n) = g(-l)^X,0b)(j + j), 

where the coefficients Xi(7c), X 2 (7c), • • • satisfy the recurrence relation 

(6) (fc + p)Vi(fc) + P'X p (/c) = X p (7c + 1) 


^ products of h different numbers taken from 


with Xo = 0, Xj = 1 and X t+ i(7c) = 0. 


Proof. Clearly, among all 


n 
k + 


0 


products of (h -f 1) numbers out of 


(1,2, ■ • • , n), there are exactly ^ y products containing the greatest factor n. 

The sum of these products is therefore n-Sh(n — 1). Repeating this reasoning, 
we get 

(7) Sj;+i(h) = n'Sk(n 1) -)- (w — l)‘iS*,(a — 2) -j- • • • (7c -f- 1) • Sk(k). 


Evidently, (5) is true for 7c = 1, Suppose now that it is true for 7c = 7b. Then 
the right-hand side of (7) can be -written as 

Tcn-riti-i mf;-;- 1 ) 

)»-o p-i \ 7c + p / 

-g(-«“w( l )[( t+ p + i)(»+'+J)-p(4+; 1 )] 

= 2 (-l) t+p+1 [(7b + p)Xp_i(fc) 4- p-Xp(7c)]^ * 

The lemma thus follows by induction on 7c. 
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The number Skin) may be culled a Stirling number of the first kind By the 
lemma just proved, it is easy to find 

( n V) 

«»> - «(“ + 3 ) -1„ (* 5 2 )+(” + l ) 

(8) ,S,W - 105 (" + 4 j - 105 ( r ‘ + 3 j 25 ( n + 2 J - f” + 

S,(n) = !)«(” + 5 )- 136o(" + 4 ) + 48o(“+ 3 ) 

_ 56 („+2) + (» + 1 ) 

We shall see that in order to compute the coefficients of fi(k), / 2 (fc), • ■ • , it is 
sufficient to compute the values of W(m, n), Xi(m), X 2 (m), ■ ■ • , (ra = 1,2, • • • , l). 
Let f(x) — x n+k . Then by lemma 1, we have 

n’S„« +W]_(l +2 

From the definition of S L (n) it is easily seen that 
(n + k)(n + k — 1) • • • (n + 1) = n n + n ” 1 Si{lc) + • • ■ + nSk-i(k) + S k (Jc) 
Hence we may write 


1 [ d * f(x + inll - n “ (l + m 4- W + . + S*(fc)\ 

n! + 5nj i_ 0 “ V + ~ + + + -*)• 

It is clear from Kimball’s paper [1] that 


W(m, n) +0{n~ l ~ 1 ). 


ULy^-tQ:) 

Substituting, we obtain 

S -* = [l + t (l) WU n) + 0(„--)] 


( 9 ) 


1 + E ~ + 0(n“ t_1 ) 


m -1 rr 


“ ra [’ +.?, (L) w[n ■ n) + 

Tl + E E f * + + 0(n -<-1 )l 
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The last expression shows that the as 3 r mptotic expansion (2) can be obtained 
by computing the numbers X p (m), W(m, n) with 1 < p < m < t. For example, 
consider the case t— 3 and notice that [1, (2 13)] 



and that Xi = 1, X s (2) = 3, X a (3) = 10, X s (3) = 15. Then by a straightforward 
calculation of the right-hand side of (9) and by comparison with (2), we find 


fi{k) = J(2fc 2 + k) 


(10) m = ± (4fc 4 - /c 2 - 37c) 

/ a (7c) = ~ (407c 6 - 607c 6 - 27c 4 - 637c 8 + 133/c 2 - 487c). 
Finally, combining (2) with the well-known Stirling’s formula [3] 


(11) n! = V27r«,l- 


1 + -L + _1 _ - 13 ---b 0(«~ 4 ) 

^ 12» ,288n* 51840n 3 . 


and noting (1), we obtain 

02.1) A"0 n+ * = (£j Q" [l + ^ + 0(n- 4 )] 

where gi(k), g a {k), g a (,h) are polynomials in 7c, viz. 


9t(k) = ^ (M* + W + 1). 

= 2 y g (64/c 4 - 407c + 1) 

( 12 . 2 ) 

q 3 {k) = —i— 2500/c 6 - 38407c 6 + 8327c 4 - 4032/c 3 
51840 

+ 83927c 2 - 37327c - 139). 

The asymptotic formula of A"0 n+fc just derived is much better than a result 
previously obtained [4]. Moreover, it may be noted that the asymptotic ex¬ 
pansion of h may be made as sharp as desired, since in fact, for any pre¬ 
scribed t > 1, X p (m) and jBF„"( —|n), (1 < rn < (), may be easily computed by 
(6) and Kimball's [1, (2 12)] respectively 
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AN INEQUALITY FOR KURTOSIS 

By Louis Guttman 
Cornell University 

1. Summary. It is well known that, if the fourth moment about the mean 
of a frequency distribution equals the square of the variance, then the frequencies 
aie piled up at exactly two points, namely, the two points that are one standard 
deviation away from the mean In this paper is developed a general inequality 
which describes the piling up of frequency around these two points for the case 
where the fourth moment exceeds the square of ^the variance. In a sense, it is 
.shown how “U-shaped” a distribution must be according to its second and fourth 
moments. 

2. An inequality. Let % be a random variable whose distribution has the 
following moments: 

■(1) p = E(x ); v 2 = E(x - it) 2 ; (a 2 + 1> 4 = E(x - p) 4 . 

a is non-negative for any distribution, and its positive square root will be denoted 
by a. Let 

(2) t = (x - n)/tr 

It will be shown that, if X is an arbitrary positive number, then 

(3) Prob jl - \a <, g 1 + Xa} > 1 - X~ s . 

It X is chosen so as to make the left member in the braces positive, then f 2 is 
hounded away from zeio, and (3) becomes: 

(4) Prob {Vl — Xa % | 1 1 g Vt + Xa} > 1 — X 2 , (Xa < 1). 

For example, if a = .5 and X = V2, then (4) shows that the probability is 
■greater than .50 that, l is either between .54 and 1 30, or between — 1 30 and — .54, 
If a = 2 and X = 3, then (4) shows that the probability is greater than ,88 that 
t is either between 63 and 1.27, or between —1.27 and —.63. In general, the 
smaller a is, the greater the probability that t is m a small interval around +1 or 
-1. In particular, if a = 0, then X may be taken arbitrarily large, so that (4) 
shows that the probability is unity that t — ±1; this is the well known case 
referred tpiabove. 
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3. Derivation. Inequality (3) is a special case of a slightly more general 
inequality which follows very simply from that of Tchebychef. Consider the 
function i 2 — 1 + c, where c is an arbitrary real number. By using (1) and (2), 
it is seen that 

(5) E(f - 1 + cf = a 2 + c 2 . 

Then, according to Tchebychef’s inequality, if X is an arbitrary positive number, 

(6) Prob {(? - l'+ c) 2 ^ XV + c 2 )} > 1 - X" 2 , 
or, 

(7) Prob {1 - c - \\Zo? + c 2 g i 2 ^ 1 - c + XvV + C 2 } > 1 - X -2 . 

This is the general inequality that was to be shown. 

Inequality (3) is obtained by setting c = 0 in (7). 

Another special case is obtained by determining c so as to maximize the left 
member in the braces of (7) By differentiation, the maximizing value is found 
to be c = — a/\A 2 — 1, for which (7) becomes: 

(8) Prob {1 - a/3 A i g 1 + a(0 2 + 2)/0) > 1 - l/(0 2 + 1), 

where 0 is used instead of the notation Vx 2 - 1, and denotes any positive num¬ 
ber. For the same probability on the right, (8) has the advantage over (3) of 
having 1 — a/3 greater than 1 — la, so that the former may be positive even 
though the latter is negative. Inequality (8) starts the positive interval for t as 
close to -f 1 as possible. On the hand, (3) provides the minimum size interval 
for i 2 from among all values of c that make the left member in the braces of (7) 
positive. 

If it is desired to have the positive interval for t end as close to +1 as possible, 
then the right member in the braces of (7) is to be minimized. By differentia¬ 
tion, the minimizing value is found to be c = a/vV — 1, and the minimum in¬ 
equality is: 

(9) Prob jl - «(0 2 + 2)/0 g i 2 ^ 1 + <*0j > 1 - l/(0 2 + ]). 

4. Distribution Around m- If the left member in the braces of (7) is negative, 
then instead of giving information about the piling up of probability of t around 
+ 1 and — 1, (7) provides a statement about the probability in an interval around 
M- Alternatively, this may be regarded as a confidence interval for n. The 
minimum interval is given by (9); actually, it holds regardless of the value of the 
left member in the braces, another way of stating it is: 


(10) 


Prob {- Vl + a/3 ^ « g Vl 4- > 1 - l/(0 2 + 1). 
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TABLE FOR ESTIMATING THE GOODNESS OF FIT OF EMPIRICAL 

DISTRIBUTIONS 

By N. Smirnov 

1. Editorial Note. The table presented on pp 280-281 was originally pub¬ 
lished in [1], It gives values of 

L(i) = 1 - 2E(-1)’V-' = 

V“*l jih*1 

which is also derived in [2]. 

Let (AT, ■ • ■, IJ be a sample of independent variables with the same con¬ 
tinuous cumulative distribution function F(x), and let N (z) be the number of 
Xh which are < z By empirical distribution is meant the step-function 
* 1 ^( 2 ) « N(z )/n, The maximum D n of the difference | ( 2 ) - F(z) | is a 
random variable and L(z) is the limiting cumulative distribution function of 
n %, If D m , n is the maximum of the difference | t m {z) - FT(z) | between 
the empirical distributions of two independent samples of sizes n and n, respec¬ 
tively, then L(t) is also the limiting cumulative distribution function of 
(m)[m + w)) 1/2 D W|ft . 
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TABLE of L(a) II 

TABLE of L(z)~ 


TABLE of L{z)~ 

— 

Z 

L(z) 

Continued 


Continued 




Z 

Liz) 

z 

Liz) 

9, ft 

non noi 







.29 

.000 004 

.69 

.272 

189 

1.09 

.814 

342 

.30 

.000 009 

70 

.288 

765 

1.10 

.822 

282 

.31 

.000 021 

.71 

.305 

471 

1.11 

.829 

950 

.32 

.000 046 

.72 

.322 

265 

1.12 

.837 

356 

.33 

000 091 

.73 

.339 

113 | 

1.13 

.844 

502 

.34 

.000 171 

.74 

.355 

981 i 

1.14 

.851 

394 

.35 

.000 303 

.75 

.372 

833 

1 15 

.858 

038 

.36 

000 511 

76 

.389 

640 

1.16 

.864 

442 

.37 

.000 826 

.77 

.406 

372 

1.17 

.870 

612 

.38 

.001 285 

.78 

.423 

002 i 

1.18 

.876 

548 

39 

.001 929 

.79 

.439 

505 

1.19 

.882 

258 

.40 

.002 808 

.80 

.455 

857 

1.20 

.887 

750 

.41 

.003 972 

.81 

472 

041 

1.21 

.893 

030 

.42 

.005 476 

.82 

.488 

030 

1.22 

.898 

104 

.43 ' 

.007 377 

83 

.503 

808 

1.23 

.902 

972 

.44 

.009 730 

.84 

.519 

366 

1.24 

.907 

648 

.45 

.012 590 

.85 

.534 

682 

1.25 

912 

132 

.46 

.016 005 

.86 

.549 

744 

1.26 

.916 

432 

.47 

.020 022 

.87 

.564 

546 

1.27 

.920 

556 

.48 

.024 682 

.88 

.579 

070 

1.28 

.924 

505 

.49 

.030 017 

.89 

593 

316 

1.29 

.928 

288 

.50 

.036 055 

.90 

.607 

270 

1.30 

.931 

908 

.51 

.042 814 

.91 

.620 

928 

1.31 

.935 

370 

.52 

.050 306 

.92 

.634 

286 

1.32 

.938 

682 

.53 

058 534 

93 

.647 

338 

1.33 

.941 

848 

.54 

.007 497 

.94 

.660 

082 

1.34 

.944 

872 

.55 

.077 183 

.95 

.672 

516 

1.35 

.947 

756 

56 

.087 577 

.96 

.684 

636 

1.36 

.950 

512 

.57 

.098 656 

.97 

.696 

444 

1.37 

.953 

142 

.58 

.110 395 

.98 

.707 

940 

1.38 

.955 

650 

59 

.122 760 

.99 

.719 

126 

1.39 

.958 

040 

.60 

.135 718 

1.00 

.730 

000 

1.40 

.960 

318 

.61 

.149 229 

1.01 

740 

566 

1.41 

.962 

486 

.62 

.163 225 

1.02 

.750 

826 

1.42 

.964 

552 

63 

.177 753 

1.03 

.760 

780 

1.43 

966 

516 

.64 

192 677 

1.04 

.770 

434 

1.44 

.968 

382 

.65 

.207 987 

1.05 

.779 

794 

1.45 

.970 

158 

66 

.223 637 

1 .06 

.788 

860 

1 46 

.971 

846 

67 

.239 582 

1.07 

.797 

636 

1.47 

.973 

448 

.68 

255 780 

1.08 

.806 

128 

1.48 

.974 

970 



table for estimating goodness of fit 


281 


TABLE of L{z)- 
Continued 

— 

TABLE of L{z)- 
Continued 


TABLE of L(z)~ 
Concluded 


\ 

2 

L{z) 

Z 

L(z) 

z 

m 

1.49 

.976 

412 

1.89 

.998 

421 

2.29 

.999 

944 

1.50 

.977 

782 

1.90 

.998 

536 

2.30 

.999 

949 

1.51 

.979 

080 

1.91 

.998 

644 

2.31 

.999 

954 

1.52 

.980 

310 

1.92 

.998 

744 

2.32 

.999 

958 

1.53 

.981 

476 

1.93 

.998 

837 

2.33 

.999 

962 

1.54 

.982 

578 

1.94 

.998 

924 

2.34 

.999 

965 

1.55 

.983 

622 

1.95 

.999 

004 

2.35 

.999 

968 

1.56 

.984 

610 

1 96 

.999 

079 

2.36 

.999 

970 

1.57 

.985 

544 

1.97 

.999 

149 

2.37 

.999 

973 

1 58 

.986 

426 

1.98 

.999 

213 

2.38 

.999 

976 

1.69 

.987 

260 

1.99 

.999 

273 

2.39 

.999 

978 

1.60 

.988 

048 

2.00 

.999 

329 

2.40 

.999 

980 

1.61 

.988 

791 

2.01 

.999 

380 

2.41 

.999 

982 

1.62 

.989 

492 

2.02 

.999 

428 

2.42 

.999 

984 

1.63 

.990 

154 

2.03 

.999 

474 

2.43 

.999 

986 

1.64 

.990 

777 

2.04 

.999 

516 

2.44. 

.999 

987 

1.65 

991 

364 

2.05 

.999 

552 

2.45 

.999 

988 

1.66 

991 

917 

2.06 

.999 

588 

2.46 

.999 

989 

1 67 

.992 

438 

2.07 

.999 

620 

2.47 

.999 

990 

1.68 

.992 

928 

2.08 

.999 

650 

2 48 

.999 

991 

1.69 

.993 

389 

2.09 

.999 

680 

2.49 

.999 

992 

1.70 

.993 

823 

2.10 

.999 

705 

2.50 

.999 

9925 

1.71 

994 

230 

2.11 

.999 

728 

2 55 

.999 

9956 

1.72 

.994 

612 

2.12 

.999 

750 

2 60 

.999 

9974 

1.73 

.994 

972 

2.13 

999 

770 

2.65 

.999 

9984 

1.74 

.995 

309 

2.14 

.999 

790 

2.70 

.999 

9990 

1.75 

.995 

625 

2.15 

999 

806 

2.75 

.999 

9994 

1.76 

995 

922 

2.16 

.999 

822 

2 80 

.999 

9997 

1.77 

.996 

200 

2.17 

999 

838 

2.85 

.999 

99982 

1 78 

.996 

460 

2.18 

999 

852 

2.90 

.999 

99990 

1.79 

.996 

704 

2.19 

.999 

864 

2.95 

.999 

99994 

1 80 

.996 

932 

2.20 

.999 

874 

3.00 

999 

99997 

1.81 

997 

146 

2.21 

.999 

886 




1.82 

.997 

346 

2.22 

.999 

896 




1.83 

997 

533 

2.23 

.999 

904 




1.84 

.997 

707 

2.24 

.999 

912 




1.85 

.997 

870 

2.25 

.999 

920 




1.86 

.998 

023 

2.26 

.999 

926 




1.87 

.998 

145 

2'. 27 

.999 

934 




1.88 

.998 

297 

2.28 

.999 

940 
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BOOK REVIEW 

Fundamentals of Statistics Truman Lee Kelley. Harvard University Press, 
1947; pp. xvi, 765. $10.00, 

. Reviewed by A. M. Mood 

Iowa Slate College 

First, a brief look at the contents: introductory matter, broad classifications 
of types of data, quantitative and qualitative aspects of data, construction of 
tables, charts, and graphs—200 pages; location and scale parameters, and 
moments—75 pages; normal distribution—30 pages; exact sampling distributions 
based on normal theory—5 pages; binomial distribution, goodness of fit tests, 
contingency tables, normal approximation to the distribution of the variance 
ratio, properties of Chi-square—20 pages; correlation and regression—150 pages. 

These first 480 pages constitute the essential part of the book and the part that 
will be commented on here. But there are 270 more pages, the content of which 
we shall merely note without comment. There is a chapter of 90 pages entitled 
“Sundry Statistical Issues and Procedures 11 which discusses fifteen issues such as 
periodicity, time senes, curve fitting, variance error of a coefficient corrected for 
attenuation, machine extraction of square roots, and sequential analysis. There 
follows a chapter of 40 pages devoted to no leas than twenty-three topics in 
mathematics, topics such as: matrices and determinants, the square root trans¬ 
formation, expanding a table, spaces of three or more dimensions, and Fourier 
series. The remaining 140 pages contain numerical tables, references, various 
indexes, and a test designed to measure the adequacy of students' mathematical 
preparation. 

This then is another book which deals with the descriptive aspects of statistics. 
Despite its lillc, it omits discussion of distribution theory, sampling theory, 
the theory of estimation, tests of hypotheses, or the theory of probability. The 
phrase “confidence interval” appears not once, I believe, in the entire 750 pages. 
The discussion of Student’s distribution is brief enough to be quoted in its 
entirety (page 284): “The t-distnbution, shown through the courtesy of Dr. 
Philip J. Rulon, in Chart VIIIII, is appropiiate for interpreting the significance 
of means, differences of means, and of regression coefficients, for small samples— 
say N less than 15. It is the distribution of these statistics computed from small 
samples drawn from a parent normal distribution ” 

Thus the author denies any value to the developments m the fundamentals of 
statistics during the past twenty-five or thirty years. He does this not merely 
by implication but in so many words, referring to modem statistical inference, 
he writes (page 13): “A still greater weakness is that it is essentially a deductive 
procedure and relatively sterile in suggesting new courses—in inspiring creative 
inferences. It is fundamentally a method of proof and not one of invention ” 
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He is therefore fully aware of his extreme position, and takes great pains to 
justify it. His thesis is that the main purpose of statistics is to suggest new 
hypotheses to the scientist. In developing this thesis he writes (page 15): “The 
physicist observes seemingly irregular changes in x as y changes. He repeats 
his experiment, controlling more and more of the conditions, and repeats again 
and again, and, if successful, he reaches a law at the end of his work. He has 
been using statistics,” But his discussion avoids certain relevant questions. 
Why does the physicist repeat the experiment? Why did he perform it in the 
first place? Hid he suspect before he collected any data that x and y might be 
related? 

At any rate, the opinion of most present-day statisticians is that the primary 
role of statistics in scientific research is statistical inference. This opinion is 
certainly well-founded in my own experience. Here at Iowa State College the 
Statistical Laboratory is intimately implicated in the research programs of all 
departments—physical, biological, and social. These scientists perform their 
experiments with a specific purpose in mind—usually the estimation of some 
parameters, sometimes the testing of a hypothesis. They never seem to seek 
m a collection of data some new hypothesis by artful selection between the mean, 
the mode, the geometric mean, the harmonic mean, and the median 

It must be reported that, even as a book on descriptive statistics, it leaves 
much to be desired. The errors usually found in such books are to be found here 
as well as many more. There is the long discussion of skewness and lsurtosis 
based on the false notion that moments are determined by the nature of the 
distribution in the neighborhood of the mean, Certain properties of the normal 
distribution are imputed to all distributions. Erroneous criteria for selecting 
amongst the many means are given. The universality of the normal distribution 
seems exaggerated; thus, for example, referring to deviations from regressions 
(page 364): “Since the quantities (.to — Xo) are ‘errors’ we may regularly assume 
them to be normally distributed.” Population parameters and their estimates 
are confused. The book contains a great many statements (like the final one in 
the section on the Student distribution quoted above) which are so carelessly 
written that they have to be counted as errors. Several of the derivations and 
arguments are also carelessly constructed, an extreme example of this appears on 
page 206: “Is the mean an unbiased statistic? M = (x„ + x& + x„ + ■ • ■ + 
Xn)/N. Since the various x’s are independent, there are just N degrees of freedom 
and M is unbiased.” 

Students will likely have difficulty with this book. There is an air of arti¬ 
ficiality because of the omission of any discussion of population distributions and 
the notion of random sampling. Without any background of this kind it is 
hard to motivate the presentation, and the various topics become isolated. 
Moments are defined in terms of sample observations, and population moments 
are defined merely as the limits of these moments as the sample size becomes 
infinite. To introduce the mean, the author writes essentially: let us consider 
the function /(b) = [2 x l /N] llb . There is no pointing to the middle of a distribu- 
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turn function, or even a sample, or a histogram. The variance is introduced the 
same way; one considers the function 211 1 , - 1 , |"/(F -f). Technical terns 
are used tvitbout definition, for example, in the passage about the mean quoted 
above, the student suddenly encounters the word "unbiased” without definition 
or previous discussion and must infer its meaning from the context. 

Perhaps the best part of the book are three chapters on correlation and regres¬ 
sion, The idea of correlation is here introduced with the discussion of a numerical 
example, and several other topics are discussed in terms of examples, This part 
of the hook is very exhaustive; every sort of coirclalion coefficient is discussed 
as is every sort of correction to such coefficients. But still the wiling is careless, 
and there is some confusion of ideas The worst confusion occurs because the 
distinction between normal and intraclass correlation is never brought out; the 
discussion hops back and forth between the two ideas with no hint that they 
am not the same thing, This part of the book, too, is in the style of statistics of 
thirty years ago; the emphasis is on correlation coefficients rather than regression 
coefficients, 
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Reader# air invited to subwit to the Sea ciai y of the Institute news items of interest 

Personal Items 

Dr. Leo A, Aroian of Hiintei College has been promoted to an assistant 
professorship. 

Mr. Carl A. Bennett is now with the General Electric Co , Hanford Engineering 
Project, Richland, Washington, as an engineer in the Statistical Division. 

Dr. Arthur B. Brown has been promoted from an Assistant Professor to an 
Associate Professor of Mathematics at Queens College, Flushing, New York. 

Professor Maurice II. Belz has returned to the University of Melbourne, 
Carlton, Australia after having spent six months in the United States. 

Dr. Edward E. Cureton, member of Richardson, Bellows, Henry A Co , Inc., 
industrial psychologists, is now at the United States Naval Air Station, Pensa¬ 
cola, Florida working on a project with the Navy r The object of this project is 
to improve ground school training, especially instructor training, in the Naval 
Air Training Command. 

Mr. Eric F. Gardner has accepted an assistant professorship at the School of 
Education, Syracuse University, Syracuse, New York. 

Mr. Lee S. Gunlogson, formerly with the Lumbermens Mutual Casualty Co. 
at Chicago, is now with the Marketing Services Division, Carrier Corporation, 
Syracuse, New York. 

Dr. Theodore E. Harris has accepted a position with the Douglas Aircraft Co , 
Santa Monica, California. 

Dr. Manuel 0. Ilizon, a former graduate student in the Mathematics Depart¬ 
ment, University of Michigan, is now with the Bureau of Banking, Manila, 
Philippines as Actuary-Examiner 

Mr. Julius Lieblein, formerly m the Treasury Department, Washington, D G., 
lias tiansferred to the Statistical Engineering Laboratory, National Bureau of 
Standards, where he is working on pioblems in acceptance sampling and process 
control. 

Mr. Jack Moshman, formerly a tutor of mathematics at Queens College, 
Flushing, New York, has been appointed to the staff of the Department of 
Mathematics, University of Tennessee. 

■ Dr. Horace W. Norton, formerly with the U S. Weathei Bureau, Washington, 
D. C as meteorologist, is now at Oak Ridge, Tennessee. His position there is 
to study the application of statistics to reliability of weighings and analyses in 
connection with accountability lor source and fissionable materials. 

Mr. Emil D. Schell of the Bureau of Labor Statistics has been appointed Chief 
of the Mathematics and Electronic Computer Branch in the Office of the Comp¬ 
troller, United States Air Forces. 

Miss Bernice Scherl, formerly with the Schenley Research Institute, Inc., New 
York, has accepted a position as Statistician, Shell Oil Co., New York 
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Dr. Irving E Segal, who has been an assistant at the Institute for Advanced 
Study at Princeton, New Jersey, has accepted an assistant professorship in the 
Mathematics Department, University of Chicago. 

Miss Rosedith Sitgreaves, assistant statistician in the United States Public 
Health Service, has returned to her position in Washington after doing advanced 
study at Columbia University. 

Dr. John E. Walsh, who received his doctor's degree in mathematics from 
Princeton University last October, is now employed by Douglas Aircraft Co., 
Inc. of Santa Monica, California. 

Mr. Winfred P. Wilson, a former graduate student at the University of Michi¬ 
gan, has accepted an assistant professorship at the University of Houston, 
Houston, Texas in the Department of Mathematics. 


Announcement has been received of a new journal, The British Journal of 
Psychology, Statistical Section, whieh is published by the Council of the British 
Psychological Society. The editors are Professor Sir Cyril Burt and Professor 
Godfrey Thomson. The first issue has been published and later issues will be 
published as material warrants. Subscriptions and inquiries should be sent to 
the University of London Press, Ltd., Warwick Square, London, E. C. 4. 


Announcement of Navy Department Joint Board of U. S. Civil Service 

Examiners 

Implementing its scientific research and development program both geo¬ 
graphically and in new fields of endeavor, the Navy Department is currently 
expanding three comparatively new, permanent laboratories in California. 
Heretofore, the Navy Department’s scientific centers have been concentrated in 
the eastern and eastern seaboard areas. 

Two of the laboratories have been established as the logical outgrowth of 
programs carried on by universities during the war. The Naval Ordnance Test 
Station, China Lake (formerly Inyolcern), California, 160 miles from Los Angeles, 
was originally an activity of the California Institute of Technology. Its present 
program involves research, development and test work with ordnance equipment 
and explosives. The Navy Electronics Laboratory at San Diego, California is 
the outgrowth of work done by the University of California. It is concerned 
with research,, testing and development of electronic control devices, detection 
equipment, instrumentation equipment and training aids. The Naval Air 
Missile Test Center at Point Mugu on the coast of Califorma, 60 miles north of 
Los Angeles, was established when the need for an installation became apparent 
as the result of the Navy Department’s activities on guided missiles. The Test 
Center's activities arc concerned with flight and laboratory testing and evaluation 
of guided missiles and their component'-'. 

Each of the establishments has current need for qualified personnel in a variety 
of scientific fields to stall its laboratories Recently completed at the Naval 
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Ordnance Test Station is Michclson Laboratory at a cost of $6,000,000. Many 
more millions of dollars have been spent in equipment and facilities. Additional 
construction and facilities are, planned for both the Air Missile Test Center and 
the Electronics Laboratory 

The work programs of the laboratories aie planned, directed and accomplished 
under the direction of an outstanding staff of civilian scientists. Extensive use 
is made of the council method of operation. Constant liasion is maintained with 
other research organizations, universities, scientific associations, and outstanding 
authorities throughout the nation. 

Professional positions arc in the career service of the Federal government under 
Civil Service laws. Examinations are now open in the three scientific establish¬ 
ments in the following professional fields: Chemist, Mathematician, Metallurgist, 
Meteorologist, Physicist, Statistician, Scientific Research Administrator and 
Scientific Staff Assistant. 

Examinations are also open in the following branches of the Engineering 
profession: Aeronautical, Chemical, Civil, Electrical, Electronics, General, 
Industrial, Material, Mechanical, Metallurgical, Ordnance, Safety and Structural. 

Salaries for most of the positions range from 553397 to $9975 per annum 
Salaries are predicated on the level of ability, knowledge and experience required 
to effectively discharge the duties of a specific position 

Further information may be obtained from the Navy Department Joint Board 
of U. S. Civil Service Examiners, 1030 East Green Street, Pasadena 1, California. 


Reorganization of Philosophy of Science Association 

The Philosophy of Science Association has been reorganized with Philipp Fiank 
of Harvard University as President; C West Churchman of Wayne University, 
Detroit, as Secretary-Treasurer. 

The following are members of the Governing Committee: Gustav Bergmaun, 
State University of Iowa, Thomas A. Cowan, Wayne University; Clyde Kluck- 
hohn, Harvard University; Sebastian Littauer, Columbia University; F. S C. 
Northrop, Yale Yniversity. 

The official journal of the Association is the Philosophy of Science of which 
Professor C. West Churchman is Acting Editor. Manuscripts should be sent to 
the Acting Editor. 

Applications for membership may he sent to the Secretary-Treasurer. Dues 
are $5.00 a year. 

Tho Association encourages the establishment of local groups in the philosophy 
of science. 


Columbia University Conference on Industrial Experimentation. 

The School of Engineering of Columbia University m the City of New York 
announces an Intersession Five-day Intensive Training Conference on Industrial 

C 
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Experimentation to be offered September 14-18,1948 by the Department of In¬ 
dustrial Engineering in cooperation with the Department of Mathematical Sta¬ 
tistics of the Graduate Faculty of Political Science. 

The lecturing will be shared by Professors S. B. Jittnuer and J. Wolfwitz and 
a staff of special lecturers drawn from industiy. 

A descriptive brochure will be ready for mailing in the latter part of July. 
For further details, interested persons may communicate directly with Professor 
S. B. Littauer, Department of Industrial Engineering, Columbia University, New 
York 27, New York 


New Members 

The following persona have been elected to membership in the Institute 
(December 1, 1947 to "February 28, 1948) 

Angulo, Walter J., B E. (Johns Hopkins Univ.) Graduate student at Johns Hopkins Uni¬ 
versity, 5229 Beaufort Ave , Baltimore IB, Maryland. 

Beard, Helen P., Ph D. (Mass Institute of Tech.) Assistant Professor of Mathematics, 
Newoomb College, New Orleans 18, Louisiana. 

Blomquist, Nils G., (Univ. of Stockholm) Statistician, Sverige Reinsurance Company, 
Aladdinsvagen 47, Smedslatten, Sweden. 

Bodwell, Charles A., M.S (Univ of Michigan) Graduate studont at the University of 
Michigan, Box 773, West Lodge, Ypsilanti, Michigan. 

Burnett, Jean, M.S. (Mich. State College) Instructor in Mathematics, Michigan State 
Collego, 702 Cherry Lane, East Lansing, Michigan. 

Burton, Robert E., Student at Michigan University, 18S9 Atkinson Avenue, Detroit 2, 
Michigan 

Byrd, Paul F., MS. (Univ of Chicago) Meteorologist, CJ.S.A.F , Weather Detachment, 
Loclcbourne Air Base, Columbus 17, Ohio 

Cernuschi, Felix, Ph.D (Univ. of Cambridge) Professor at the University of Montevideo, 
Asociacion Uruguaya de Estadistica, Av. Agraciada 1464, Montevideo, Uruguay. 

Connor, William S., Jr., M A (Univ. of North Carolina) Associate Professor of Economics, 
Umveisity of Kentucky, College of Commerce, Lexington, Kentucky 

Dalemus, Tore, Pil kand. Hastholmsvngen 16, Stockholm, Sweden. 

Davis, Roderic C., M S ("Calif Institute of Tech.) P-6 Mathematician, Head of Assessment 
Section, P 0. Box N-467, N O.T.S., Inyokern, California. 

Dvoretzky, Aryeh, Ph.D. (Hebrew Univ , Jerusalem) Research Fellow at Hebrew Uni¬ 
versity, Jerusalem, % American Friends Hebrew University, S East 89th St., New York. 

Gardner, Robert S., M 9 (Tulane Univ ) Instructor in Statistics, Mathematics Department 
at Ohm State University, 215 W. Eleventh Avenue, Columbus, Ohio. 

Goins, Mary, M S. (Univ of Mich.) Assistant Professor of Mathematics, Marshall College, 
P5 Ninth Avenue, Huntington, West Virginia. 

Hratz, Joseph A., B.A. (St. Ambrose College) Instiuctov of Mathematics, St. Ambrose 
College, Davenport, Iowa. 

Kelly, Harriet J., PhD (Univ. of Iowa) Research Associate, Head of Statistical Dept. 
Children’s Fund of Michigan, 600 Frederick, Detroit 2, Michigan 

Kincaid, Wilfred M., PhD >Bio\wi Uuu ) Tn'-UuoLor m Mathematics, University of 
Michigan, Ann -V joi, Michigan 

Kish, Leslie, 13 S. '(’ullcgeol the C’uvor X Y.) Senior (Sampling Statistician, 701 Ml Pleas¬ 
ant, Ann Aihoi, U,cliiimi, 
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Lehr, Marguerite, Ph I) (Bryn Mawr, Pa ) Associate Piofossor of Mathematics, Bryn Mawr 
College, Cart,ref, llryn Mawr, Pennsylvania. 

Loizelier, Enrique Blanco, M.A. (Madrid Univ,) Professor of Statistics, Faculty of Eco¬ 
nomics, Madrid, Univemity, Neman If., Madrid, Spam. 

Lorenzo, Cesar M., M A, (Amciican Univ , Wash.,D G.) Statistician, Food and Agriculture 
Organization of Llio United Nations, 1736 De Sales Sheet, N.W , Washington 6, D. C. 

Lott, Fred W., Jr., M.A, (Univ. of Mich.) Teaching Fellow at the. University of Michigan, 
1SSS Malden Conti, Willow Run, Michigan. 

Mantel, Nathan, B.S, tOity College of New York) BiosLatistician, U. S Public Health Seiv- 
ice, Jii£t Lily l’ouds Dr , N.E., Washington If), D. C. 

Marrian, Dixon M., A.M (Columbia Univ.) Master at the Gilman Country School, Balti¬ 
more, Mil., 150ft Shadyside Road, Baltimore 18, Maryland. 

Maitins, Octavio Augusto L., M.A. (Columbia Univ.) Tecnico do Educacao, Department of 
National Education, Itio de Janeiro, Rua Ftgueiredo de Magalhaes ft. Apt. 70S, Rio de 
Janeiro, Brazil. 

Meyer, Herbert Albert, Ph.D. (Umv. of Iowa) Associate Professor of Mathematics, %016 
West Columbia, Gainesville, Florida. 

Nikitich, Nicholas, B.O.S. (New York Umv.) Timokeoper, 20 Featherbed Lane, New York 
6%, New York. 

Oakland, Gail Barker, M.A. (Umv of Minnesota) Associate Professor of Statistics, Uni¬ 
versity of Manitoba, Winnipeg, Canada 

Olkin, Ingram, B.S. (College of City of N Y.) Graduate Student at Columbia University, 
SJ5 Fori Washington Ave , New York 82, New York. 



REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 


The thirty-third meeting of the Institute of Mathematical Statistics was held 
at Columbia University, New York City, New York on Wednesday afternoon 
and Thursday, April 14 and 15,1948. The meeting was attended by 158 persons, 
including the following 78 members of the Institute: 

M. Afzal, L. A, Aroian, R. M. Auor, W D Baton, R. E. Bechliofer, J. II. Bushey, J, M. 
Cameron, B. H Camp, G. C Campbell, S D. Canter, Manuel Cynamon, Tore Dalenius, 
J, F. Daly, J L. Doob, C. W. Dunnett, Avyoh Dvoretzlcy, Churchill Eissnhart, Benjamin 
Epatom, M. W. Eudey, D A. Fraser, M A Geisler, Mary Goins, II. H. Goode, E J. 
Gumbel, M, Ii Hansen, Mina Ilaskind, L II. Ilerback, S M Ikhliar-ul-Mulk, Seymour 
Jablon, L. F. Knudsen, Jack Laderman, Howard Levone, S. B Littauer, F. M. Lord, 
Irving Lorge, Eugene Lukacs, W. G Madow, Sophie Marcuse, Robert Mirsky, E. B, 
Mode, D J Morrow, Frederick Hosteller, D. N Nanda, P M. Neuratli, G. E Noether, 
M L Norden, Ingram Olkin, P S. Olmstead, A. L. O’Toole, Katharine Ponse, E J. Pit¬ 
man, W. A Reynolds, J S Rhodes, II. E. Robbins, IT G. Romig, Ernest Rubin, Herman 
Rubin, P. J. Rulon, Fiank Saidel, G. R Seth, M. A. Schlorok, S. S- Shrikhande, Rose- 
dith Sitgreaves, Milton Sobel, Emma Spaney, F. F. Stephan, B. R. Suydam, Henry 
Teioher, J. W Tukoy, A. Wald, II M. Walker, J. E. Walsh, S. S. Wilks, Dzung-shu Wei, 
Lionel Weiss, Jacob Wolfowitz, C A Wright, Mohammad Yusuf 

The Wednesday afternoon session, Professor 8. B. Littauer of Columbia Uni¬ 
versity presiding, was devoted to the following two invited addresses: 

1 Incomplete Block Designs 

Prolosaor R C. Bose, Calcutta Univeisity and the University of North Carolina 
2. Non-Paramett ic Inference 

Professor J. G Pitman, University ol Tasmania and Columbia University 

The Thursday mornmg session, Professor Hobart Bushey of Hunter College 
presiding, consisted of a Symposium on Scales of Measurement at which two 
invited papers- 

1 The Development of Psychological Scaling Techniques 
, Professor Harold Gullilcsen, Princeton University 

2 A Generalized, Model for Scales 

Professor Paul Lazarsfeld, Columbia University 

were followed by prepared discussion by Professors Phillip Rulon of Harvard 
University and John Tukey of Princeton University. 

The Thursday afternoon session, Dr, Harry G. Romig of Bell Telephone 
Laboratories presiding, was devoted to the following contributed papers: 

1 Optimum Chaiacler of the Sequential Probability Ratio Test 

Professors Abiaham Wald and Jacob Wolfowitz, Columbia University 
2. Multi-parameter Sequential Estimation 
Mr G R, Seth, Columbia University 
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3. The Distribution of a Definite Quadratic Form 
Professor Herbert Robbins, University of North Carolina 

4. The Moments and Cumulants of the Product of £, S, or 4 Dependent Variables (Prelim¬ 
inary Report) 

Professor Leo A Aroian, Hunter College 

5. Generalization to N Dimensions of Inequalities of the Tchelycheff Type 
Professor Burton H. Camp, Wesleyan University 

0. On the Power Function of a Sign Test Formed by Using Subsamples 
Dr. John E. Walsh, Project Rand 

7. The Distribution of T J , a Multivariate Generalization of the F-test 
Miss Dorothy J Morrow, University of North Carolina. 

8. Approximate Confidence Points (Preliminary Report) 

Professor John Tulcey, Princeton University 

At all of the sessions there was active discussion from the floor. 

On Wednesday evening members and guests had dinner at the Men’s Faculty 
Club. 


S. B. Littauer 
Assistant Secretary 
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A CLASS OF STATISTICS WITH ASYMPTOTICALLY NORMAL 

DISTRIBUTION 1 

By Wassily Hoeitding 
Institute of Statistics, University of North Carolina 


1. Summary. Lei Xi, , I, be » independent random vectors, 

X v = (X^, • ■ , and $(xi, • • • , x m ) a function of m(<n ) vectors x, — 
(x^, , xt r> ) A statistic of the form U = 2"$(X ai , • • , X am )/n(n — 1) 

• • (n — m + 1), where the sum 2" is extended over all permutations 
(ai, • , a m ) of m different integers, 1 < a < n, is called a ?7-statistic If 
X \, ,X n have the same (cumulative) distribution function (d t) F(x), U is an 


unbiased estimate of the population characteristic 0(F) 


r r 

■ $(zi , 


, £»«) 


dF(x i) dF(x m ) 6(F) is called a regular functional of the d.f Fix). 

Certain optimal properties of [/-statistics as unbiased estimates of regular func¬ 
tionals have been established by Halmos [9] (cf Section 4) 

The variance of a [/-statistic as a function of the sample size n and of certain 
population characteristics is studied in Section 5 

It is shown that if X x , ■ , have the same distribution and $(xi, , x m ) 

is independent of n, the d f. of s/n(U — B) tends to a normal d.f. as n —* 
under the sole condition of the existence of E$?(X i, ■ • , X m ) Similar results 
hold for the joint distribution of several [/-statistics (Theorems 7 1 and 7 2), 
for statistics U' which, in a certain sense, are asymptotically equivalent to U 
(Theorems 7 3 ancl 7.4), for certain functions of statistics U or V (Theorem 7 5) 
and, under certain additional assumptions, for the case of the Xfa having dif¬ 
ferent distiibutions (Theorems 8.1 and 8.2). Results of a similar character, 
though under different assumptions, are contained in a recent paper by 
von Mises [18] (cf. Section 7). 

Examples of statistics of the form U or V are the moments, Fisher’s fc-statis- 
tics, Gini’s mean difference, and several rank correlation statistics such as Spear¬ 
man’s rank correlation and the difference sign correlation (cf. Section 9). 
Asymptotic powei functions for the non-parametric tests of independence based 
on these rank statistics are obtained. They show that these tests are not un¬ 
biased in the limit (Section 9f). The asymptotic distribution of the coefficient 
of partial difference sign correlation which has been suggested by Kendall also 
is obtained (Section 9h). 


2. Functionals of distribution functions. Let F(x ) = F(x (1) , • • ■ , x (r) ) be 
an r-variate d.f If to any F belonging to a subset 2) of the set of all d f.’s in the 
r-dimensional Euclidean space is assigned a quantity 0(F), then 0(F) is called a 


1 Research under a contract with the Office of Naval Research for development of multi- 
vanate statistical theory 
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functional of F, defined on 3). In this paper the word functional will always 
mean functional of a d f. 

An infinite population may be considered as completely determined by its 
d f, and any numerical characteristic of an infinite population with d.f. F that 
is used m statistics is a functional of F. A finite population, or sample, of size n 
is determined by its d.f., S(x) say, and its size n. n itself is not a functional of S 
since two samples of different size may have the same d.f 

If S(x m , ■ , x M ) is the d.f. of a finite population, or a sample, consisting 
of n elements 


( 21 ) 

then nS(x w , 


J T h 


x a = (x«\ • • , Xa' 1 ), 
is the number of elements x a such that 


(« = 1| ,n), 


x? < x w , • • • , .C < x 


„a> 


(r) 


» 


Since S(x w , • • , a: <r) ) is symmetric m Xi, ■ , x n , and retains its value for a 

sample formed from the sample (2.1) by adding one or more identical samples, 
the same two properties hold true for a sample functional 9(S). Most statistics 
in current use are functions of n and of functionals of the sample d f 

A random sample (Xi, ■ • , X„) is a set of n independent random vectors 

(2.2) X 0 = (Xl l) , ...,Xi r) ), (a = l, 

For any fixed values a; (1) , • • , x w , the d.f. £(.x (1> , • • , x w ) of a random sample 
is a random variable. The functional 6(S), where S is the d f of the random 
sample, is itself a random variable, and may be called a random functional. 

A remarkable application of the theory of functionals to functionals of d.f’s 
has been made by von Mises [18] who considers the asjonptotic distributions of 
certain functionals of sample d f’s (Cf also Section 7 ) 


3. Unbiased estimation and regular functionals. Consider a functional 
d = 6{F) of the r-variate d.f F(x ) = F(x ( ' 1) , ■ • , x lr> ), and suppose that for some 
sample size n, 9 admits an unbiased estimate for any d.f F m 3) That is, if 
Xi , ,X n are n independent random vectors with the same d f. F, there exists 
a function <p(xi, ■ • , x n ) of n vector arguments (2.1) such that the expected 
value of tp(X i, , X n ) is equal to 6(F), or 

(3.1) J ■■ I v (x x ,•■•,*„) dF(x x ) • ■ • dF(x n ) = 6(F) 

for every F in 2) Here and in the sequel, when no integration limits are indi¬ 
cated, the integral is extended over the entire space of xi, ■ ■ ■ ,x n . The integral 
is understood in the sense of Stieltjes-Lebesgue. 

The estimate v>(a;i, • ■ , as n ) of 9(F) is called unbiased over 3). 

A functional 6(F) of the form (3.1) will be referred to as regular over 3). 2 

1 This is an adaptation to functionals of d f.’s of the term “regular functional” used by 
Volterra [21], 
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Thus, the functionals regular over 2) are those admitting an unbiased estimate 
over 3) 

If 8 (F) is regular over 3), let m(<n) be the smallest sample size for which there 
exists an unbiased estimate Ffe , • • , a: m ) of 0 over 9): 

(3 2) 6 (F) = J • • J Ffe ,■■■ , x m ) dF(x 0 ■ • dF(x m ) 

for any F m2). Then m will be called the degree over 3) of the regular func¬ 
tional 8 (F). 

If the expected value of <p(X\ , ■ • • , X n ) is equal to 6 (F) whenever it exists, 
<p(x i, • • • , x n ) will be called a distribution-free unbiased estimate (d-f.u.e.) of 8 (F). 
The degree of 8 (F) over the set 3)o of d.f.’s F for which the right hand side of (3.1) 
exists will be simply termed the degree of 8 (F). 

A regular functional of degree 1 over 9) is called a linear regular functional 
over 3) If 8 (F) has the same value for all F in 9), 6 (F) may be termed a regular 
functional of degree zero over 2) 

Any function $fe , • , x m ) satisfying (3.2) will be referred to as a kernel of 
the regular functional 0(F) 

For any regular functional 8 (F) there exists a kernel <h 0 (*i, • • ■ , x m ) symmetric 
in xi, • • • , x m . For if Ffe, , x m ) is a kernel of 8 (F), 

(3 3) Fofe, • • • , x m ) = — ZFfe,, • • • , x a J, 

7)1 1 

where the sum is taken over all permutations fe , • • , «,„) of (1, • • ■ , m), is a 
symmetric kernel of 6 (F). 

If 8 i(F) and 0 2 (F) are two regular functionals of degrees TOi and m 2 over 3), 
then the sum 8 i(F) + 82 (F) and the product 9 i (F)6i(F) arc regular functionals 
of degrees <m = Max (mi, m 2 ) and <mi + m 2 , respectively, over 3). For if 
Fife , ■ ■ ■ , x mi ) is a kernel of 0,(F), (i = 1,2), then 

81 (F) + 82 (F) = J ■■ j fiJife , • - , x mi ) + $ 2 fe, • • ■ , x mi )} 

dF(xi) ■ • • dF(x m ) 

and 

6 i(F)di(F) J' J' Fife , • • , x,rti)F 2 (:r mi -i_i, • , Xm^mf) 

dF(x 1 ) • • • dF(Xmi+mz) 

More generally, a 'polynomial in regular functionals is itself a regular functional 
Examples of linear regular functionals are the moments about the origin, 

*4. J (x (1) r ■ (z< r) r dF(x w , , X M ). 
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A moment about the mean is a polynomial in moments n' about 0, and hence a 
regular functional over the set % of d.f’s for which it exists (cf, Halmos [9]). 
For instance, the variance of X (1) , 

a = J J ((aii 0 ) 2 - .ti 1 ' a^ 1 ') dF(x[ l) ) dFfa l) ) 

is a regular functional of degree 2. A symmetrical kernel of a is (x (I> — x'^f/2. 
If 2) is the set of univariate d f’s with mean m and existing second moment, 
<r 2 is a linear regular functional of F over 2), since then we have 


The function 


v = 


n(n 


- / W - m)‘ dF(4"). 

An 2 5 M” - *“)’ - -A, 2 bl" - i 2 *$")' 

1 j a^0 1- a \ Tb 0 J 


is a distribution-free unbiased estimate of a. The function 




is known to be an unbiased estimate of c over the set of univariate normal d f's, 
but it is not a d -f. u e. 


4. [/-statistics. Let ®i,* , x„ be a sample of n vectors (2.1) and 

$(xi, • • • , x m ) a function of m(<n) vector arguments. Consider the function 
of the sample, 


(4,1) U = Ufa, ■ ■ ■ , x n ) = 


n(n — 1) (n — m + 1) 


2 " $(&, 


“i > 


X “v) ’ 


where 2" stands for summation over all permutations (ai, ■ ■ , a m ) of m integers 
such that 


(4 2) 1 < a* < n, a, cl, if i j, (i, j - 1, • ■ ,m) 

U is the average of the values of $ in the set of ordered subsets of m members 
of the sample (21). U is symmetric in % x , ■ ■ , x n . 

Any statistic of the form (4 1) will be called a U-statistic. Any function 
$(*i i • 1 i x m) satisfying (4.1) will be referred to as a kernel of the statistic U 
If $(xi, • • , x m ) is a kernel of a regular functional 9(F) defined on a set 2), 
then U is an unbiased estimate of 6(F) over 2): 

(4.3) 6(F) = J ■■■ j Ufa ,■■■ ,x„) dFfa ) ■ ■ • dF(x n ) 

for every F in 2). 
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For n = m, U reduces to the symmetric kernel (3.3) of 0(F) 

From a recent papei by Halmos [9] it follows for the case of univariate d f.’s 
(r = li¬ 
lt 6(F) 1S a regular functional of degree m over a set 3) containing all purely 
discontinuous d f’s, U is the only unbiased estimate over 3) which is symmetric 
in Ti, • • • , x n , and U has the least variance among all unbiased estimates 
ovei 3). 

These results and the proofs given by Halmos can easily be extended to the 
multivariate case (r > 1). 

Combining (3 3) and (4.1) we may write a 17-statistic in the form 

(4-4) V{% i,- ,**) = (”) 2'$o0r ai , • • , z« m ), 

where the kernel $0 is symmetric in its m vector arguments and the sum S' is 
extended over all subscripts a such that 


1 < ai < < • • < a m < n. 


Another statistic frequently used for estimating 9(F) is 0(S), where S = S(x) 
is the d.f. of the sample (2 1). If S is substituted for F in (3.2), we have 


(4.5) 


= l £ ... i 

10 a 1=1 




In particular, the sample moments have this form; their kernel 4 is obtained 
by the method described in section 3 
If m = 1, 6(S) = U. If to - 2, 

e(s) = "—i u + - { - *(*. > *«) 1 , 

n n 1 J 


and 6(S) is a linear function of [/-statistics with coefficients depending on n. 
This is easily seen to be true for any m. In general 6(S) is not an unbiased esti¬ 
mate of 6(F) If, however, the expected value of 6(S) exists for every F in 2), 
we have 


E{6(S)} = 6(F) + 0(n x ), 

and the estimate 6(S) of 6(F) may be termed unbiased in the limit over 2). 

Numerous statistics m current use have the form of, or can be expressed in 
terms of [/-statistics. From what was said above about moments as regular 
functionals, it is easy to obtain [/-statistics which are d.-f. u e.’s of the moments 
about the mean of any order (cf. Halmos [9]) Fisher’s /e-statistics are [/-statis¬ 
tics, as follows from their definition as unbiased estimates of the cumulants, 
symmetric in the sample values Another example is Gini’s mean difference 


n(n — 1) 


H l ml 1 ’ - 4 


(« 
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More examples, m particular of rank correlation statistics, will be given in 
section 9. 


5. The variance of a {/-statistic. Let Xi, • 

vectors with the same cl.f. F(x ) = F(x a) , * 

(5.1) U = U(X i, • • , X n ) = ’ 


• ■ , X n lie n independent random 
, x (r> ), and let 

2'4>(X ai , ,X a J, 


where $(i'i, ■ ■ , x ,„) is symmetric m xi , • , x m and S' has the same meaning 
as m (4 4) Suppose that the function $ does not involve n. 

If 8 = d{F) is defined by (3.2), we have 


E{U] = E{9(X lt ■ ,X m )} = 8 

Let 


(5.2) $o(a.i , • • ,x c ) = Ej4>(ri, • , x\ , X c+l , , X m ) } , (c = 1, • ■ , m), 

where Xi , • , % c are arbitrary fixed vectors and the expected value is taken with 

respect to the random vectoi s A' e+ i, • • , X m . Then 

(5 3) $o-i(ai, • ■ , a'o-i) = 5{$o(.i;i, • • , a’c-i, X e )), 

and 


(5 4) 

Define 
(5 5) 

(5 6) 

We have 


E |$ c (Zi 


-Ac)} = 


(c = 1, , to). 


T(Xl , ■ • • , V m ) = $(-Tl , ■ • • , .t,„) - 8, 

4 >c(.Ti , , .-Co) = $0(^1 , • , x c ) -8, (c = to). 


(5 7) 'k c _i(.Li, • • , Xc-i) = E{1' c (x i, • , , Z c )}, 

(5.8) T{T C (X 1 , ••• ,X C )} = ^{T(Zx, • ■ , X,„)} =0, (c = 1, • ■ , to). 


Suppose that the variance of 'T c (Xi , ■■■ ,X C ) exists, and let 


(5 9) fo = 0, f e = JS{¥*.(Xi , ■ • ,I t )|, (c = 1, ■ • , to). 

We have 


(5 10) f 0 = £($ 2 c (Z 1 ,-- - ,X C )} -e\ 

fc = {c(F ) is a polynomial m regular functionals of F, and hence itself a regular 
functional of F (of degree < 2?n). 

If, for some parent distribution F = F 0 and some integer d, we have td(F 0 ) = 0, 
this means that T d (Z a , • , X d ) = 0 with probability 1. By (5.7) and (5 9), 

U = 0 implies D = • • • = = 0 
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If fiCFo) — 0, we shall say that the regular functional 6(F) is stationary 3 
for F = F (] . If 

(5 11) UFo) = • • = fcCFo) = 0, f rf+1 (F 0 ) >0, (1 < d < m), 

8 (F) will be called stationary of order d for F = F 0 . 

If («i > • ‘ i <0 and (ft, • ■ , /3m) are two sets of m different integers, 1 < a,, 
ft ^ n, and e is the number of integers common to the two sets, we have, by the 
symmetry of '1', 

(5 12) £{*(X ai , • , . , xoi = r. 

If the vaiiance of U exists, it is equal to 

AU) " (m)"* 

( \-2 m 

m) 2 b ^E\HX ai ,- - ,X an W h> . ,*,„)}, 

where 2 <c) stands for summation over all subscripts such that 

1 < ai < <Xi < • • < tx,n < n, 1 < ft < ft < ■ • < p m < n, 

and exactly c equations 


Oil = ft 

are satisfied By (5.12), each term in 2 <c) is equal to The number of terms 
m 2 <<-) is easily seen to be 

n(n — 1) ■ (n — 2??i + c + 1) _ /wA fn — m\ (n 

c\(m — c ) '(m — c)! \c / \m — c / \m 

and hence, since fo = 0, 

< 5i3 > ^-(:rs (:)(r- m 

When the distributions of Xi, • • • , X n are different, F r (x) being the d.f of 
X„ , let 

(5-14) 8 a i, = E{$(X ai , ••• ,IaJ), 

(5.15) = E[^( X 1 , ■ ' , X c , X h .. , !?.-,)( “ S ai , ,3m-.) 

(c = 1, ■ • , m), 


3 According to the definition of the derivative of a functional (cf Vol terra [21]; for 
functionals of d.f’s cf von Mises [18]), the function m(m - 1) . (m — d + 1) 4rd(xi .. xf), 
which is a functional of F, is a d-th derivative of 9(F) with respect to F at the “point” F 
of the space of d f’s 
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(Ofi ( - .tfc)0li 1 >&tn— riTli* "i 7w-fl 

(5,16) = -2?) 1 I / c (a J,. ,ct c )Bi,"-,fi m -c(Xa u ’" j • ,«c)Ti. • • -.Tm-c 

{X ai , • • •, *«.)} 

cl(m — c)!(m — fi ) 1 


(5.17) U„ = 


^-dcfai,.' c,71i' 'iVtk— a 


n(n~- 1) . • • (n - 2m + c + 1) 
where the sum is extended over all subscripts a , 0, y such that 

1 < ai < • • < a, < n, 1 <jSi < - * < j3r.,-c < », 1 < Ti < •'' T.n-c < a, 
«i 7 s )3j , a, ^ 7^ , 0i ^ 7 j ■ 

Then the variance of 17 is equal to 


(518) 


■(77) 




Returning to the case of identically distributed X’s, we shall now prove some 
inequalities satisfied by £i, • , and v'(77) which are contained in the fol¬ 
lowing theorems: 

Theorem 5,1 The quantities h>"' > tw as defined by (5,9) satisfy the in¬ 
equalities 


(519) 


0 < - < ~ 
c a 


if 1 < c < d < m. 


Theorem 5 2 The variance a‘(U„) of a U-stalistic U n — U(X i, • , -V„), 
where X \, ■ • , A r „ are independent and identically distributed , satisfies the in¬ 
equalities 


(5.20) 

na i (U n ) is a decreasing function of n, 


-ix <Au n )<-i n . 

n n 


(5 21) 


(n + 1V 2 (77 b+ i) < n/iUf), 


which takes on its upper bound mf m foi n = m and lends to its lower bound ?n s fi 
as n increases: 


(5.22) 

(5.23) 


<r 2 ([/ra) = fm , 

lim na\U n ) = m . 


If E{U„} = 6(F) is stationary of order >d ~ 1 for the df of X a , (5.20) may 
be replaced by 


j Kn(m, d)U < AUf) < KM d)U, 


(524) 
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ivhere 


(5.25) 


K n (m, d) 


( n V 1 y' ( m ~~ *V n ~ 

\m) ~d\c - 1 A m ~ c / 


We postpone the proofs of Theorems 5.1 and 5.2. 

(5.13) and (5.19) imply that a necessary and sufficient condition for the 
existence of o- 2 (f7) is the existence of 

(5.26) U = E{$ 2 (Zi, • ■ • , X m )} - 8 1 

or that of , Z m )) 

If fi > 0, <r 2 (U) is of order rT 1 

If B(F ) is stationary of ordei d for F = F 0 , that is, if (5 11) is satisfied, </(!7) 
is of order n~ d ~ l Only if, for some F = F a , 6(F) is stationary of order m, where 
m is the degree of 6(F), we have a(U) = 0, and U is equal to a constant with 
probability 1 v 

For instance, if 6(F a ) = 0, the functional $ 2 (F) is stationary for F = F 0 . 
Other examples of stationary “points” of a functional will be found in section 9d 
For proving Theoiem 5.1 we shall require the following 
Lemma 51 // 


(5 27) S d = fd — ( J U -1 + ( 2 ) r<»~* •' + (“l) d 1 ( ^ 


fi, 


wc have 

(5.28) 

and 

(5 29) 


Sd > 0, 


U = S d + S d -1 T • + __ i ) Sl ■ 


(d = 1, • • • , m)‘ 


Proof (5 29) follows from (5 27) by induction 
For proving (5 28) let 

Va = e\ rj a = , ,X C )\, (c = m). 


Then, by (5 10), 


fc 1?0 , 


and on substituting this in (5.27) we have 


Sd = t (-i r° 


c =0 



Vc 


From (5.9) it is seen that (5.28) is true for d = 1. Suppose that (5 28) holds 
for 1, • ■ ■ , d — 1. Then (5.28) will be shown to hold for d. 
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Let 

IoObi) = $i(*i) — 8 , ^c(a)i , *2 , •' • , *«+i) 


— $i+l(^l j ' ‘ ) ®e+l) $»(®J ) " ' ' ) ®e+l)i 

(c = 1, • 

■■ ,d- 

1). 

For an arbitrary fixed Xi , let 




Vc(%i) = 0{fa(a.’i, X 2 , • • , X c+1 )}, 

(c = o, • 

’■,d~ 

1). 

Then, by induction hypothesis, 




uw = 2 (—i) d_1_ ° ( d 7 

1 Vc(xi) > 0 




for any fixed . 

Now, 

E{v°(X l)} = Ve+l ~ Vc , 


and hence 

E{5UXi)} -g(-l) 

c=0 


d—l-e(d 

c 


') 


(Ve+1 — V<i) = 2 (~ l) 


d—C 


GtaO 




The proof of Lemma 5 1 is complete. 

Proof of Theorem 5 1. By (5 29) we have for c < d 

d (d > 


Cfd 




(5.30) 


= £ 


0 W “ “ 


5a + C K ( ) S«> 

a™c-fl \^/ 


From (5 28), and since > 0 if 1 < a < c < d 9 it follows that each 

term in the two sums of (5 30) is not negative. This, in connection with (5.9) 
proves Theorem 5 1. 

Proof of Theorem 5 2. From (5 19) we have 


Cfl ^ ("c ^ 

m 


(c = m). 


Applying these inequalities to each term in (5.13) and using the identity 


(5.31) 



TO 

n 


5 


we obtain (5.20) 

(5 22) and (5.23) follow immediately from (5 13). 
For (5.21) we may write 


(5.32) 


On > 0, 
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where 

D n = na- 2 (U„) - (n + l)AUn+i) 

Let 


Dji — ^ 1 dn,cjjC 


Then we have from (5.13) 


(5.33) 





-- - (:) (■; “ 10 <» --+«- (:)" <«■ -»-*- 

(1 < o < m < n)- 

Putting 

L n J 

where [u] denotes the largest integer < u, we have 

d n , c <0 if c < Co, 

d n ,c >0 if c > Co 

Hence, by (5 19), 

dn, c£*c ^ — S c 0 Cd n , c , (c 1, * i Wl)j 

Co 

and 

D n > - L, 0 2 

Co c=i 

By (5 33) and (5.31), the latter sum vanishes. This proves (5 32). 

For the stationary case U = • = U -1 = 0, (5.24) is a diiect consequence of 

(5.13) and (5.19) The proof of Theorem 5.2 is complete 

6. The covariance of two 17-statistics. Consider a set of g 77-statistics, 

U w = 2'* tT> C*«i» • • , (7=1. . o), 
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each [ 7 (7) being a function of the same n independent, identically distributed 
random vectors X x , • ■ , X n . The function $ (7) is assumed to be symmetric 
in its m(y) arguments (y = 1 , > ■ ■ , g). 

Let 



E{U m ] =^ (7) (Z 1 ,--., J m(T) )} = 0 (7 \ 

(7 = 1, ■ 

■ • > ?); 

(6.1) 

¥ 7) {xi, ,Zm(7)) = 4 tT) (a;i, ••• ~ ^, 

(7 = 1, • 

• •, oY 

(6 2) 

T a (7) (j;i ,•••,»») = Z('F C7) (a:i , ■ ■ ■ , x e , Xe+i , • • • , 

X m(7)) }) 



(c = 1, • • ■ ,m 

( 7 ); 7 — 1, ' 

■ • » a), 

(6.3) 

f< 7 ' 8) = E{*y\Xt , • • • , X C )^(X 1 , • • • , x c ) ] , 

T— i 

II 

*0 



If, in particular, 7 = 8 , we shall write 
(6 4) = f cv,v> = , • • ,Z C )} 2 . 


Let 

a (U M , U (i) ) = E{{U m - 0 (7> )([/ t!> - O) 


be the covariance of I/ <7) and U w . 

In a similar way as for the variance, we find, if m ( 7 ) < m (s), 



The right hand side is easily seen to be symmetric in 7 , S, 

For 7 = S, (6 5) is the variance of U' 7> (cf (5.13)). 

We have from (5.23) and (6.5) 

hm nir 2 ([/ (7) ) = m 2 (y)fi 7> , 

n-voo 

lim na(U M , U w ) = m(y)m(S) ri T ' s) . 

' 71—>00 

Hence, if f( 7) ^ 0 and fi 5) 5 ^ 0, the product moment correlation p(t/ (7) , U (s) ) 
between C/ (7) and U (>) tends to the limit 


( 6 . 6 ) 


lim P (t/ (7) , U w ) 


VrFFF' 


7. Limit theorems for the case of identically distributed X a ’s. We shall now 
study the asymptotic distribution of ^-statistics and certain related functions. 
In this section the vectors X a will be assumed to be identically distributed. An 
extension to the case of different parent distributions will be given in section 8 . 

Following Cram 6 r [ 2 , p. 83] we shall say that a sequence of d f’s F i(a), 
F 2 (*), • ■ ■ converges to a d.f, F(x) if lim F n (x) = F(x) in every point at which 
the one-dimensional marginal limiting d.f.’s are continuous 
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Let us recall (cf. Cram6r [2, p. 312]) that a g-variatc normal distribution is 
called non-singular if the rank r of its covariance matrix is equal to g, and singular 
if r < g. 

The following lemma will be used in the proofs 

Lemma 7.1. Let Vi, Vt, • ■ be an infinite sequence of random vectors V n — 
(Vn ] , • • , Vn''), and suppose that the d /. F n (v) of Vn tends to a df F(v) as 
n oo. Let V'fi'’ = Vi y) + di y) , where 

(7.1) lim Eidi" 1 '} 1 = 0, (t=1, 

n— 

Then the df of Vn — (Fi 15 , • • ■ , Vn'') tends to F(v) 

This is an immediate consequence of the well-known fact that the d.f. of V' n 
tends to F(v ) if d'fi' converges m probability to 0 (cf Ci'amdr [2, p. 299]), since 
the fulfillment of (7.1) is sufficient for the latter condition 
Theorem 7.1. Let Xi, ■ • • ,X n ben independent, identically distributed random 
vectors, 

X a = (X?, ••• ,X?), (« = 1, ■■,*). 

Let 

$ (^i > ‘ *" , (7 “ 1, * 1 q ) , 

be g real-valued functions not involving n, <E> (7) being symmetric in its m(7) (<n) 
vector arguments x a = (;r« 1 , • , Xa'), (a = 1, • • , m( 7); 7 = 1, •■,?)• 
Define 

(7.2) = ( m ( r) ) _1 E' $ W( X ai , • • • , (7 = 1, • ,g), 

where the summation is over all subscripts such that 1 < a\ < • • • < a m ( T ) < n 
Then, if the expected values 

(7.3) e M = , • • • , x mw )), (7 = 1, • , g), 

and 

(7.4) ■ • ,X ro(7) )} 2 , (7 = 1, ,9), 

exist, the joint d.f of 

Vn(u m - e m ), ••• ,Vn(U w - e w ) 


tends, asn^> °o,to the g-variate normal d.f. with zero means and covariance matrix 
(m(7)m(5)fi 7,i) ), where is defined by (6.3). The limiting distribution is 
non-singular if the determinant | | is positive. 

Before proving Theorem 7.1, a few words may be said about its meaning and 
its relation to well-known results 

For g = 1, Theorem 7 1 states that the distribution of a 17-statistic tends, under 
certain conditions, to the normal form. For m = 1, XI is the sum of n inde- 
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pendent random variables, and in this case Theorem 7.1 reduces to the Central 
Limit Theorem for such sums For m > 1, U is a sum of random variables 
which, in general, are not independent Under certain assumptions about the 
function , • • , x m ) the asymptotic normality of U can be inferred from 
the Cential Limit Theorem by well-known methods. If, for instance, <3? is a 
polynomial (as m the case of the fc-statistics or the unbiased estimates of mo¬ 
ments), U can be expressed as a polynomial in moments about the origin which 
are sums of independent random variables, and for this case the tendency to 
normality of U can easily be shown (of. Cramdr [2, p. 365]). 

Theorem 7.1 generalizes these results, stating that in the case of independent 
and identically distributed X a ’s the existence of E [<h 2 (Xi , ■ ■ , X m ) ) is sufficient 
for the asymptotic normality of U. No regularity conditions are imposed on the 
function €>. This point is important for some applications (cf. section 9). 

Theorem 7.1 and the following theorems of sections 7 and 8 are closely related 
to recent results of von Mises [18] which were published after this paper was 
essentially completed. It will be seen below (Theorem 7.4) that the limiting 
distribution of y/n[U — 0(F)] is the same as that of V n[Q(S) — 6(F)] (cf. (4.5)) 
if the variance of 8(S) exists. 0(S) is a differentiable statistical function in the 
sense of von Mises, and by Theorem I of [18], -\/n[9(S) — 0(F)] is asymptotically 
normal if certain conditions are satisfied. It will be found that m certain cases, 
for instance if the kernel $ of 9 is a polynomial, the conditions of the theorems of 
sections 7 and 8 are someivhat weaker than those of von Mises’ theorem. 
Though von Mises’ paper is concerned with functionals of univariate d.f’s only, 
its results can easily be extended to the multivariate case. 

For the particular case of a discrete population (where F is a step function), 
U and d(S) are polynomials in the sample frequencies, and their asymptotic 
distribution may be inferred from the fact that the joint distribution of the fre¬ 
quencies tends to the normal form (cf. also von Mises [18]). 

In Theorem 7.1 the functions $ <7> (xi, ■ ■ , x m (7) ) are supposed to be sym¬ 
metric. Since, as has been seen m section 4, any ([/-statistic with non-symmetric 
kernel can be written m the form (4 4) with a symmetric kernel, this restriction 
is not essential and has been made only for the sake of convenience. Moreover, 
in the condition of the existence of F[4> 2 (Xi, • ■ , I m )j, the symmetric kernel 
may be replaced by a non-symmetric one. For, if 4> is non-symmetric, and <3?o is 
the symmetric kernel defined by (3.3), E{^l(X 1 , ■ • ■ , X,,,)) is a linear combina¬ 
tion of terms of the form E [$(X ai , • • , X a J <t> (X Pl , • ■ ■ , Xp m )), whose exist¬ 
ence follows from that of F[ ( h 2 (Xi, ■ • • , X OT )} by Schwarz’s inequality. 

If the regular functional 9(F) is stationary for F = F 0 , that is, if ft = ft(F 0 ) = 0 
(cf. section 5), the limiting normal distribution of \/n(U — 6 ) is, according to 
Theorem 7.1, singular, that is, its variance is zero. As has been seen in section 
5, <j (U) need not be zero m this case, but may be of some order ri~ c , 
(c = 2, 3, ■ • , m), and the distribution of n l2 (U — 6) may tend to a limiting 
form which is not normal. According to von Mises [18], it is a limiting dis¬ 
tribution of type c, (c = 2, 3, ■ ■ • ). 
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According to Theorem 5.2, a 2 (U ) exceeds its asymptotic value for any 

finite n Hence, if we apply Theorem 7.1 for approximating the distribution of 
U when n is large but finite, we underestimate the variance of U For many 
applications this is undesirable, and for such cases the following theorem, which 
is an immediate consequence of Theorem 7.1, will he more useful 
Theorem 7 2. Under the conditions of Theorem 7 1, and if 

rf 7) >0, (y = !,■■■ ,g), 


the joint d.f of 

( U m - , ( u (o) 


d (e) )/a(U w ) 


tends, asn —> «>, to the g-vanate normal d f. with zero means and covariance matrix 
(p (7 ’ 8) )> where 


tv,« 

p 


o{u M , u w ) r( 7 ' 5) 


(y, $ = l, • ■ • . g) 


Proof of Theorem 


7.1. The existence of (7.4) entails that of 

= - ,x mM )}* - (e w ) 2 


which, by (5.19), (5.20) and (6 6), is sufficient for the existence of 

fc (T) , of AU W ), and of ri 7 ' s) < Vfpff 

Now, consider the g quantities 

F A> = £ *«(X a ), (7=1, ••■,?) 

<r-l 

where ^{x) is defined by (6.2). F a> , • • , F to> are sums of n independent, 
random variables with zero means, whose covariance matrix, by virtue of (6.3), is 

(7.5) {«r(7 W ,0} = {m(y)m(S) fl T ’ 5) }. 

By the Central Limit Theorem for vectors (cf. Cram6r [1, p 112]), the joint d.f. 
of (F (1) , • • , F M ) tends to the normal g-variate d.f. with the same means and 
covariances. 

Theorem 7 1 will be proved by showing that the g random variables 

(7.6) Z (y) = Vn(U iy) - 0 (T) ), (y — I, ''' , ff), 

have the same joint limiting distribution as F (1) , • • ■ , Y {0) 

According to Lemma 7.1 it is sufficient to show that 

(7.7) lim E(Z m - F w ) ! = 0, (y - 1, ■ • , »)■ 


For proving (7.7), write 

(7.8) E{Z^ - F w ) 2 = E{Z (y) } 2 + E[Y (y) } 2 - 2 E{Z ly) Y (y) ] 
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By (5.13) we have 

(7 9) E[Z W ) 2 = w\U {y) ) = m 2 ( T )d T> + 0(0, 

and from (7.5), 

(7.10) E{Y iy) } 2 = m 2 (y)d 7> . 

By (7 2) and (6.1) we may write for (7.6) 

Z M = O ( m ( Y) ) _1 2 ' , ‘ , **.«), 

and hence 

®U (7) y (T) } = *w ( ? r £ £'£{*i T) (xj* (7 Hx ai , • •, x am(T) )}. 

\«W/ a-1 

The term 

S{^ 7) (X„)'I' (7) (X ni , • • .zjl 

is = d 7) if 

(7.11) ai = o: or a:2 = a ■ or a,„(y) = a 

and 0 otherwise. For a fixed a, the number of sets (cu , • • , such that 

1 < «i < • • < a m ( 7) < n and (7.11) is satisfied, is Thus, 

( 7 12) *|z«r»| - »(,) ( m ; ) )"‘» (,) r !’ 1 = -’Wff’ 1 . 

On inserting (7.9), (7.10), and (7.12) in (7.8), we sec that (7.7) is true. 

The concluding remark m Theorem 7 1 is a direct consequence of the definition 
of a non-singular distribution The proof of Theorem 7.1 is complete 
Theorems 7.1 and 7.2 deal with the asymptotic distribution of U w , • , f7 (c) , 

which are unbiased estimates of 6 (1) , • , 0 (o) The unbiasedness of a statistic 

is, of course, irrelevant for its asymptotic behavior, and the application of Lemma 
7.1 leads immediately to the following extension of Theorem 7 1 to a larger class 
of statistics 
Theorem 7.3. Let 

]Ay) 

(7.13) U [0)l = U (o) + £=, ( 7 = 1, • • • , g), 

V n 

where JJ M is defined by (7.2) and bi y) is a random variable. If the conditions of 
Theorem 7.1 are satisfied, and lim E{bi y) } 2 = 0, (7 = 1, • • , g), then the joint 
distribution of 

Vn(U m ' - 6 m ), ■ , Vn(U w - 6 W ) 

tends to the normal distribution with zero means and covariance matrix 

{m(y)m(5)d 7 ’ 8) l. 
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This theorem applies, in particular, to the regular functionals 0(S) of the 
sample d.f., 


6(S) = ■ i HX ai , ■ •, X a ), 

in the case that the variance of d(S ) exists For we may write 

»•«<« = (») [7+ 2.*(X„,. 


where the sum 2* is extended over all m-tuplets (o>i, • • • , a m ) in which at least 
one equality od = afii ^ j) is satisfied. The number of terms m 2* is of order 
n m ~ L Hence 


8(S) - U = -D, 
n 

where the expected value E[D 2 j, whose existence follows from that of cr 2 { 0(*S)|, 
is bounded for n —> Thus, if we put Z7 (T)/ = 0 <7) ($), the conditions 
of Theorem 7.3 are fulfilled. We may summarize this result as follows: 

Theorem 7.4 Let Xi, ■ , X„ be a random sample from an r-variate popula¬ 
tion with d.f F{x) = F(x ll) , ■ , x <r) ), and lei 


6'y\F) = [ f * (7 >(ai, • • • , x n(y) ) dF(xi) ■ dF(x m w ), (y = 1, • , g), 


be g regular functionals of F, where <f> ( 7) (.ri, ■ , x mM ) is symmelnc m the vectors 

, Xm(y) and does not involve n. If S(x) is the d.f. of the random sample, 
and if the vanance of 


71 “ 1 =1 a wi(Tf ) =1 

exists, the joint d f of 

V^{d a) (S) - e a \F)}, ■ ,V^{d (0) (S) - e^(F)} 


tends to the q-vanate normal d.f. with zero means and covariance matrix 

{m(7)?n(S)r^ T ' {, ] 


The following theorem is concerned with the asymptotic distribution of a 
function of statistics of the form U or TJ' 

Theorem 7.5. Let(U') = (U w , ■ , U <fl) ) be a random vector, where J7 (7) ’ 
is defined by (7.13), and suppose that the conditions of Theorem 7 3 are satisfied. 
If the function h(y) = h(y {1) , ■ ■ , y [a) ) does not involve n and is continuous together 
with its second order partial derivatives m some neighborhood of the point (y) = (6 ) = 
( 9 (l) , , 9 (l7) ), then the distribution of the random variable \/ n { h ( U ’) — h ( 6 ) J 

tends to the normal distribution with mean zero and variance 


Z Z m(y)m(s) 
7-1 6-1 



n 


(1,5) 
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Theorem 7.5 follows from Theorem 7.3 in exactly the same way as the theorem 
on the asymptotic distribution of a function of moments follows from the fact 
of their asymptotic normality; cf Cram6r [2, p. 36G] We shall theieforc omit 
the proof of Theorem 7 5. Since any moment whose variance exists has the 
form U' = 6 (S) (cf. section 4 and Theorem 7.4), Theorem 7 5 is a generalization 
of the theorem on a function of moments. 

8. Limit theorems for U{Xi, • • • , X n ) when the X a ’s have different distri¬ 
butions. The limit theorems of the preceding section can be extended to the 
case when the X a ’s have different distributions We shall only prove an exten¬ 
sion to this case of Theorem 7.1 (or 7.2), confining ourselves, for the sake of 
simplicity, to the distribution of a single [/-statistic. 

The extension of Theorems 7 3 and 7 5 with g = 1 to this case is immediate. 
One has only to replace the reference to Theorem 7.1 by that to the following 
Theorem 8.1, and 0 and fi by E[ U\ and 0,„ . 

Theorem 8.1. Let Xi, ■ ■ , X n be n independent random vectors of r com¬ 
ponents, X a having the d f F a (x) = F„(x m , ■ , a: (r) ). Lei $(. 11 !, • • , z m ) he a 
function symmetric m its m vector arguments xp = (xp\ ■ • • , xp r) ) which does not 
involve n, and let 

(8.1) 'Liw(x) = 'Lim*!, -,™ m _ t (m), (v = 1, • ■ , n), 

where ST' is defined by (5.15), and the summation is extended over all subscripts a 
such that 


1 < m < a.i < • • • < a m _i < n, a, v, (i — 1, • , m ). 

Suppose that there is a number A such that for every n = 1,2, • 

(8 2) J ■■ f § 2 ( Xl , ■ • • , xj dF ai (x 0 • • dF an OO < A, 

(1 < ai < as < • • < a m < n), 
that 

(8-3) E | ^i(,)(X») | < ro, ( P = 1, 2, • ■ , n), 

and 

jl / f n \ 3/2 

(8.4) lim £ E | § 3 1W (X v ) \ / Z E | f? w (X,)} =0. 

»-»«> >=i / O-i ) 

Then, asn —► <n,the d.f.of (U — E\ XJ))/a(JJ) tends to the normal d.f.withmean 
0 and variance 1. 

The proof is similar to that of Theorem 7.1. 

Let 


w = -±*i M (x f ) 
n 



It will be shown that 
(a) the d f. of 


A CLASS OS' STATISTICS 


311 


y_ w - E{W\ 

<r(W) 

tends to the normal d.f. with mean 0 and variance 1, and that 
(b) the d f of 


U - E{U) 

*(U) 


tends to the same limit as the d.f of V. 

Part (a) follows immediately from (8 3) and (8 4) by Liapounoff’s form of the 
Central Limit Theorem 

According to Lemma 7.1, (b) will be proved when it is shown that 


lim E{V' - V} 2 = lim 


«r(E7, W) \ 
<r(U)a(W)j 


= 0 


or 

(8.5) 


, c{U, W) _ , 
n™ a(U)<r(W) 


Let c be an integer, 1 < c < m, and write 

jc = (mi, • • , x c ), y = (yi , • ■ • , ym-c), z = 
F m (x) = F ai (x i) • • • F ac (x e ), F m (y) = F fl (yi) 


F m (z) = F yi {z x ) 
Then, by Schwarz’s inequality, 


Fy m - t ,(*m-c) 


(®1 ) ' ' i 2m—c) 

’ Fp m ^ c (ym~o)i 


f ■■■ J $(*> y)$(x, z) dF M (x) dF w (y) dF M (z) ■ 

- {/ ’ " / ^ ^wCx) dF (j) (y) 

• / ■ ■ • J z ) dF(a\(x) dF(y)(z )| , 

which, by (8 2), is < A for any set of subscripts. 

By the inequality for moments, 9 aif . ., 0m , as defined by (5.14), is also uni¬ 
formly bounded, and applying these inequalities to (5 16), it follows that there 
exists a number B such that 

(8 6) lL*»i, 71. .7m-. I < (c = 1, " • , m), 

for every set of subscripts satisfying the inequalities 

a„ ^ <x h , f3„ p h , y„^lh cr, ^ P,, ^ T;, 

(i = 1, ,c,j= 1, • ,m - c). 
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Now, we have 

E[W\ = 0 
and 

(8.7) a\W) =^E£(f'l W (I-)l 

rr f-i 

or, inserting (8 1) and recalling (5.16), 

(8.8) = jVeE'E'i-iw.,. 

n \m — i) ,_i (/►> 

the two sums S' being over ay < • • < a m _i, (a; ^ v), and /3i < • • • < /3„„_ 1 , 
(d» ^ v), respectively. By (5.17), the sum of the terms whose subscripts 
v, ay, ■ ■ , a m -y , /3i, • ■ , dm —i are all different is equal to 

n(n — 1) ■ ■ (ft - 2 m + 2) _ / n — 1V n — m\ 

(]m — 1) !(m — 1)! 1,n n \m — l/\m — 1/ 1,n ‘ 

The number of the remaining terms is of order n m ~ 2 . Since, by (8.6), they are 
uniformly bounded, wo have 

(8.9) * <r\W) = - 2 f lin + 0(«T 2 ). 

n 

Similarly, we have from (5.18) 

c 2 (f7) = — fi.n + 0(n 2 ), 

and hence 

(8.10) a(U) = o-(TF) + 0 (n" 1 ). 

The covariance of U and W is 


(8.11) c(U, W) = E'W«(X,)¥, 

\"v 71 i/=l 




, *««)}■ 


All terms except those in which one of the a’s = v, vanish, and for the re¬ 
maining ones we have, for fixed ay, • , ct m , 

■^{^lw(-^v)iIhn(a ll ... | a w )(Xa 1 j ’ ' 1 , -X« m ) } 

— — l) | 1 ® r i(i')@ 1 |...,p ro _ 1 (^)SE , i(,) Tll .., lTm _ 1 (X v ) j 

(m — l) ' -Pm-llVy," iYm—1 

where the summation sign refers to the /3’s, and yi, • ■ • , y m ~i are the a’s that 
are v. Inserting this m (8.11) and comparing the result with (8.8), we see that 
(8 12) *(U, W) = <r\W). 
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From (8 12) and (8.10) we have 

<r(U, W) = a(W) _ na(W) 

<r{U)a{W) a(U) naffW) + 0(1)* 

Comparing condition (8 4) with (8 7), we see that we must have w(W) -> » 
as ii —> 00 • This shows the truth of (8.5). The proof of Theorem 8.1 is complete 
For some purposes the following corollary of Theorem 8.1 will be useful, where 
the conditions (8.2), (8 3), and (8 4) are replaced by other conditions which are 
more restrictive, but easier to apply 

Theorem 8 2, Theorem 8 1 holds if the conditions (8.2), (8 3), and (8.4) are 
replaced by the following. 

There exist two positive numbers C, D such that 

(8.13) /•■■/! *"(*1» • • • > O I dF ai (xf) . • • dF am (x m ) < C 


for a, = 1, 2, ■ • • , (i = 1, • • • , m), and 

( 8>14 ) .Pm-1 > D 

for any subscripts satisfying 

1 < «1 < Oii < • * • < a m _i, 1 < Pi < & < • • • < 0m- 1, 1 < v a,, /3,, 

We have to show that (8.2), (8 3), and (8 4) follow from (8 13) and (8.14). 
(8.13) implies (8.2) by the inequality for moments. By a reasoning analogous 
to that used in the previous proof, applying Holder’s inequality instead of 
Schwarz’s inequality, it follows from (8.13) that 

(8.15) E | §i W (X M ) | < C\ 

On the other hand, by (8.7), (8.8), and (8.14), 

(8.16) i>{'hw(X„)} > nD. 

(8.15) and (8.16) are sufficient for the fulfillment of (8.4). 


9. Applications to particular statistics. 

(a) Moments and functions of moments It has been seen m section 4 that the 
/c-statistics and the unbiased estimates of moments are 17-statistics, while the 
sample moments are regular functionals of the sample d f By Theorems 7 1, 
8.1, and 7 4 these statistics are asymptotically normally distributed, and by 
Theorem 7 5 the same is true for a function of moments, if the respective condi¬ 
tions are satisfied These results are not new (cf , for example, Cramfir [2]) 

(b) Mean difference and coefficient of concentration If Fi, ■ , Y n are n in¬ 

dependent real-valued random variables, Gini’s mean difference (without repeti¬ 
tion) is defined by 





314 


WASSILY HOEIWDING 


If the 7 a ’s have the same distribution F, the mean of d is 

5 = // 1 2 /i — 2 /a I dF(y i) dF(y 2 ), 
and the variance, by (5.13) is 

^ d) = rU r^-'l) {2fl(S)(n “ 2 ) + M«)}, 

where 

fiW = f j/ 12/1 - 2/2! dF(i/2)| d/'%) - a 2 , 

(9.2) f 2 (fi) = Jf (yt- y 2 f dF(yi) dF(y 2 ) - 5 2 = 2 <r 2 (F) - s\ 

The notation fi(5), f 2 (5) serves to indicate the relation of these functionals of 
F to the functional 8(F), 5 is here merely the symbol of the functional, not a par¬ 
ticular value of it In a similar way we shall write <%i ,y 2 \d) = \yi - y 2 \ , 
etc. When there is danger of confusing fi(5) with h(F), we may write fi (F \ 8) 
U. S. Nair [19] has evaluated a 2 (d) for several particular distributions, 

By Theorem 7 1 , ^/ n (d - 8) is asymptotically normal if f 2 ( 5 ) exists. 

If 1 1 Y„ do not assume negative values, the coefficient of concentration 
(cf Gini [ 8 ]) is defined by 



where Y = SY a /n G is a function of two [/-statistics. If the Y a > s are identi¬ 
cally distributed, ilE{Y i } exists, and if y = E[Y] >0, then, by Theorem 7,5, 
Vn (G - o/ 2 ,i) tends to be normally distributed with mean 0 and variance 

^ 8 1 
4^4 ftO*) _ ~ t ?i(mi s) + -- fi(s), 

where 

G(m) = f y* dF(y) ~ J = ff 2 (y), 

hin, 6 ) = f f yilyi - y 2 \ dF(yi) dF(y 2 ) — yS, 
and fi(5) is given by (9.1). 

(c) Functions of ranks and of the signs of variate differences. Let s(u ) be the 
signum function, 

— 1 if it < 0 , 

s(u) = 0 if u = 0 ; 

1 if u > 0 , 


(9.3) 
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and let 


0 if u < 0, 


(9.4) 

If 


c[u) = !{1 + = £ if u = 0, 

1 if u > 0. 


X a = (zi 15 , • - • , xl r) ), (a = 1, • ■ ,n) 

is a sample of n vectors of r components, we may define the rank R' a ‘ ) of x[ l) by 


Ri l) = 1 + 2 c(xi l) - 4°) 

?=i 

(9.5) 

= 2++ 1 + jI s(t« } - xp), (i = 1, • • ■ , r) 

Z (3=1 

If the numbeis Xi\ xi l) ) ■ • , xi x) are all different, the smallest of them 
has rank 1, the next smallest rank 2, etc. If some of them are equal, the rank 
as defined by (9.5) is known as the mid-rank. 

Any function of the ranks is a function of expressions c(.tL i) — 4' ) ) or 
s(xi l) - 

Conveisely, since 

s(4 l) - = s(R L’ 5 - Rp), 

any function of expressions sfci 0 - x^) or c(x L° — xjj' } ) is a function of the 
ranks. 

Consider a regular functional 6(F) whose kernel 4 J +i, ' ‘, x m ) depends only 
on the signs of the variate differences, 

(9.6) _ sfaP - 4% (a, P = 1, • • • , m, i = 1, , r). 

The corresponding [/-statistic is a function of the ranks of the sample variates. 
The function $ can take only a finite number of values, Ci, , c ;V , say If 

tti = P{$ = c t ), (i = 1, • • • , AO, we have 

N 

6 = Cl7Tl + • ■ • + CnTTN , 2 F-. = 1- 

ir, is a regular functional whose kernel $,(*i, ■ • ■ , x m ) is equal to 1 or 0 accord¬ 
ing to whether $ = c, or ^ c,. We have 

$ = Cl<3?l + ■ • ■ + Ct!$y . 


In order that 6(F) exist, the c, must be finite, and hence <f> is bounded. There¬ 
fore, E {+) 'exists, and if Wi, AO, • • ■ are identically distributed, the d f of 
■s/niU - 6) tends, by Theorem 7 1, to a normal d.f. which is non-singular if 
fi>0. 

In the following we shall consider several examples of such functionals. 
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(d) Difference sign correlation. Consider the bivariate sample 

(9.7) (x?, a*™), (x?\ zf), • • • , (x ( n l \ zf). 

To each two members of this sample corresponds a pair of signs of the differ¬ 
ences of the respective variables! 

(9.8) s(x ( ff - 4 1 ’). »(*«’ ~ ^ 2) )> (a /3; a, = 1, • ’ , n). 

(9.8) is a population of n(n — 1) pairs of difference signs. Since 

E *(*1° - af’) = 0, (*-1,2), 

the covariance t of the difference signs (9.8) is 


(9.9) t = - 7 - t E s(x™ - xg I) )s(x ( J ) - x™) 

n(n — 1 ) 

t will be briefly referred to as the difference sign covariance of the sample (9.7). 
If all a: (l) ’s and all ® <2) ’s are different, we have 

E sW - 4 !> ) = «(n - 1), (* = 1, 2 ), 

and then t is the product moment correlation of the difference signs. 

It is easily seen that t is a linear function of the number of inversions in the 
permutation of the ranks of .r (1) and x a) . 

The statistic i has been considered by Esscher [ 6 ], Lindoberg [15], [10], Kendall 
[ 12 ], and others. 

t is a 17-statistic. As a function of a random sample from a bivariate popula¬ 
tion, t is an unbiased estimate of the regular functional of degree 2 , 

(9.10) T = //// S ^ U ~ £2 I) )s(;ri 2) — X2 ] ) dF(x i) dFixi). 

t is the covariance of the signs of differences of the corresponding components 
of Xi = (X{ 1) , A?’) and Xj = (X^, X| 2) ) m the population of pairs of inde¬ 
pendent vectors Xi, X? with identical d.f F(x) = F(x m , a; (2) ). If F(x m , a <2) ) 
is continuous, r is the product moment correlation of the difference signs 
Two points (or vectors), (a^ 1 ’, zf’) and (x?\ x^) are called concordant or 
discordant according to whether 

(^ - joc*? 1 -1 n 

is positive or negative. If 7 r <i:) and 7 r (d> are the probabilities that a pair of vectors 
drawn at random from the population is concordant or discordant, respectively, 
we have from (9 10 ) 

_ _ (0 (d) 

r — T — 7T . 

If F(x m , x m ) is continuous, we have 7 r (c) + 7 r (li) = 1 , and hence 
(9 11) t = 2 T (C) - 1 = 1 - 2tt W) . 
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If we put 

*(*“, * ,a> ) - - o, x® - 0) + J(*<“ - 0, x® + 0) 

(y.uj 

+ F(x m + 0, ,r (2) - 0) + F(x m + 0, x (V + 0)}, 

we have 

(9.13) #>(z | r) = 1 - 2 F(x a) , «) - 2f(«o, * m ) + ^(s®, *«), 

and we may write 

(9H) r = B{^ 1 {X 1 \t)}. 

The variance of t is, by (5.13), 

(9.15) a (J) = -j- — — {2fi(r)(«, — 2) + Mr)], 

where 

(916) fi(r) - £{*J(Xi | r)} - r 1 , 

(9 17) f 2 (r) = S{s 2 (Z 1 <1) - XlVtXf - X,®)) - r 2 . 

If F(x m , x m ) IK continuous, wc have f 2 (r) = 1 - r 2 , and x ,2) ) m (9.13) 
may be replaced by F(x w , ai (2) ). 

The variance of a linear function of l has been given for the continuous case by 
Lindeberg [15], [16]. 

If X (1) and X (2) are independent and have a continuous d f, we find Mr) = i, 
Mr) = 1) and hence 

(9.18) ^ = 2 ( 2n + 5) 

9 n{n - 1) 

In this case the distribution of f is independent of the univariate distributions 
of X <0 and X (2) . This is, however, no longer true if the independent variables 
are discontinuous Then it appears that a 2 if) depends on P{X] l) = X^} 
and PfX[° = Xi %) = X 3 (,) ], (t = 1, 2) 

By Theorem 7 1, the d.f. of \/n(t — r) tends to the normal form This result 
has first been obtained for the particular case that all permutations of the ranks 
of X (1) and X K) are equally probable, which coircsponds to the independence 
of the continuous random variables X a \ X' r> (Kendall [12]). In this case t can 
be represented as a sum of independent random variables (cf. Dantzig [5] and 
Feller [7]). In the general case the asymptotic normality of t has been shown 
by Daniels and Kendall [4] and the author [10]. 

The functional r(F) is stationary (and hence the normal limiting distribution 
of V n(t — r) singular) if £i = 0, which, m the case of a continuous F, means that 
the equation $i(X | r) = r or 

(9 19) 4F(X (1) , X (2> ) = 2 F(X m , •) + 2F(°o, X (2) ) - 1 + r 
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is satisfied with probability 1. This is the ease if X (2> is an increasing function 
of X®. Then t = r = 1 with probability 1, and v 2 (f) = 0 A case where (9.19) 
is fulfilled and a{i) > 0 is the following: X® is uniformly distributed in the 
interval (0,1), and 

(9.20) X® = X® + | if 0 < X® < i, X® = X® - i if i < X® < 1 

In this case r = 0, fa = 1, a{t) = 2/ft(ft — 1). 

(e) Rank correlation and grade correlation If in the sample [(x^, a'i 2 ')), 
(« = 1, . • , n), all xL l, 's and all a^’s are different, the rank correlation co¬ 
efficient, which we denote by 7c', is given by 


V = 


12 


,3 _ 


E (&? - 


n + 1 




( 2 ) 




Inserting (9.5) we have 


k' = 


_ 


LEE s(.'cL 1) - 4 l) ) s ( a « 


( 2 ) 


.1 0=1 7=1 



or 

(9. 21) 


V = 


(a — 2)/c + 3£ 
n + 1 


whore t is the difference sign covariance (9.9), and 


], - 3 y" ofa-a) _ r t')wf T (2) _ X W) 

10 ~ 71 (ft - l)(ft - 2) ^ SlXo * M y h 

the summation being over all different subscripts a, (l, y. 

k is a U-statistic, and as a function of a random sample from a population with 
d.f. F, h is an unbiased estimate of the regular functional of degree 3, 


k = 3 J ■ ■■ J s(x i 11 — x®)s(xi 2) — a:®) dF(xi) dF(xz) dF(xi) 

(9.22) 

= 3 J! (2F®(x®) - l}{2F w (x m ) - 1) dF(x), 

where F m (x m ) = F(x w , «), F m (x m ) = F{ oo,x (2) ). 

If F is continuous, we have 

/ F%) dF M (y) = jf u du = h 

J \P M (y) ~ *} 2 dF M (u) = j\u- h? du = A, (t = 1, 2), 


and m this case k is the coefficient of correlation between the random variables 
U m = F m (X w ), U m = F t2) (X (2) ). 



A CLASS OP STATISTICS 


319 


[7 (,) h_as been termed the grade of the continuous variable X w , and in the general 
case F ll) (X (l) ) may be called the grade of X w (cf., for instance, G. U. Yule and 
M. G. Kendall [22, p. 150]). In general, k is 12 times the covariance of the 
grades 

Prom (9.21) we have for the expected value of ¥, 

T?n<->\ — ~ 2 ) k -j" 3 t 

E[h 1-JTfl • 

In the continuous case the rank correlation coefficient k' is an estimate of the 
grade correlation k, which is biased for finite n but unbiased in the limit. 

The kernel 3s(aii tH — xi 1 , )s(3i 2) — a* 2 *) of k is not symmetric Denoting by 
P(ti, Xi , t, | k) the symmetric kernel of k, we have 

(9.23) '$(*1 , a’ 2 , Xa | «) = i £ sG , l 1) - .^ 1 ) )sC'rL 2) - x^) 

ay* y 

Foi computing k and the constants an alternative expression for k and $ is 
sometimes more convenient. From three two-dimensional vectors Xi, Xi , £3 
we can form tluee pairs (xi , * 2 ), (xi , m 3 ), and (xi, x s ). The number of con- 
coidant pairs among them can be 3,2,1, or 0 If 7 is the probability that among 
the three pairs formed from three random elements of the population at least 2 
are concordant, we have, if the d f, F is continuous, 

(9.24) k = 2 y — 1 . 

This is analogous to the expression (9.11) for r. 

The truth of (9 24) can be seen as follows: From the definition of 7 we have 

7 = £ , {'f(a;i, Xi, * 31 7 )}, 

whore Pfo , a 2 , £ 317 ) is = 1 if at least two of the three expressions 

(9.25) (x™ - xjPXx™ - zf), (« < ft «, /9 = 1, 2, 3) 

are positive, and equal to zero, if no more than one of them is positive Since, 
by the continuity of F, we may neglect the case of (9 25) being zero, .we may 
write 

pfe , Xi , X) | 7) = Cl 2 . 12 C 23 , 2 3 C 31,31 + Cl2,ljC2J,23C31,13 + Cl 2 , 12 Ca 3 ,32C31.31 + Cl3,2lC23,23C31,31 , 

where 

Ca,ft,y,S = c[(Xa ] ~ ~ *1™)] 

and c(u) is defined by (9.4). 

P(»i, Xi, xt 1 7 ) is symmetric in x x , x 2 , x 3 . 

The identity 

(9.26) P(.vi ,xi,%z\k) = 2P(a:i, x 2 , x 3 1 7 ) - 1 
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can be shown to hold either by algebraical calculation using (9 4) oi by direct 
computation of each side for the different positions of the thiee points Xi, X 2 , 1 C 3 . 

From (9.26) it appears that in the continuous case the symmetric kernel 
$(*! , x t , | k) can assume only two values, —1 and 4-1. 

The variance of k is, according to (5.13), 

am - fr-ttV rs {• (V) + 3 <» - »*« + }' 

where 

flW = e^Kx, I k)| - K 2 , 
f*(ic) = E[$l(Xi , X21«)) ~~ k , 

Uk) = S($ i (X ll X J ,X,|.01 - k\ 

$1(3:1 | k ) — E {$( xi , X 2 , X 3 | k) ), 

$2(3:1 ,x 2 \ k) = E[${xi , 3:2 , X 3 1 *)) . 

We find for the continuous case 

f.to = 1 - *> 

(9.27) $ 1 ( 3:1 (it) = [1 - 2F(xi\ co)][l. -2F{*, af)] - 2F(^ 1) , 00 ) 

- 2iF(«>, *J*>) + 4fF(xi°, y w )dF(co,y<”) 

+ 4 fF(3, m ,4 K )dF(]/ w , »), 

$2(3:1, 3321 k) = 1 + 2 F( 3 :P ) , xD + 2F(»2 1) , t^)- 2 c(xf - w) 

-2c(xi 2) — Xi^Fix^, «) — 2c(x2 1) — Xi ] )F(<x, xi m ) 

- 2c{xi 1) -xP)F( oo,xf ! ). 


If X (l1 , X {2 ‘ are continuous and independent, we obtain k = 0, fi = frs = tw, 
fa — 1, and hence 


(9.28) 


Ah) 


n 2 - 3 

n(n — l)(n — 2) ' 


In the discontinuous case of independence the distribution of k, as that of l, 
depends on the distubutions of X (1) and X®, and cr 2 (7c) can again be expressed 
in terms of PfX® = Xf} andPjX^ = Xj® = X, w ), (i = 1, 2). 

The variance of the rank correlation coefficient k' is, by (9 21), 

Vm _ (« - 2) 2 <r \k) + 6(n - 2)a(f, 7c) + 9«r*(t) 

} (n + l) 2 


(9.29) 
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For a(t, k) we have, according to (6.5), 

q 

^ ~ n(n — lj ~ fr( T > K M> 

where 

ft( T i k) = E{<&i(Xi | r)#j(Xi | /c)} — tk, 

hir, k) = 2?{#(Xi, Xu | r)$2(Xi, X-i | k) ) — tk. 

In the case of independence we see from (9.13) and (9.27) that 

*i(a I 0 = $i(s |«) = [1 - 2 F(x a) , »)][1 - 2F(oo, as®)], 

and we obtain 


(9.30) 

(9.31) 


fi(w, k) = ft(«) = ft(r) = i 
fs(r, ^ f, 


ir(Z, fc) 


2(n + 2) 
3n(n — 1) ‘ 


On inserting (9.28), (9.31) and (9 18) in (9.29), we find 



in accordance with the result obtained for this case by Student and published 
by K. Pearson [20]. 

According to Theorem 7.1, \/ n(k — k) tends to be normally distributed with 
mean 0 and variance 9ft(*). The same is true for the distribution of the rank 
correlation coefficient, k 1 , as follows from Theorem 7.3 m conjunction with 
(9 21). For the special case of independence the asymptotic normality of /c' 
has been proved by Hotelling and Pabst [11]. 

From Theorem 7 3 it also follows that the joint distribution of y/nit — r) 
and \/n(k — k) (or \ / n(k l — «)) tends to the normal form with the variances 
4ft(r) and 9ft(/c) and the covariance 6ft (k, t). In the case of independence we 
see from (9.30) that the correlation pit, k) between t and k tends to 1, and we have 
the asymptotic functional relation 3 1 = 2k. This result has been conjectured by 
Kendall and others [14], and proved by Daniels [3], In general, however, p(t, fc) 
does not approach unity. Thus, if X® is uniformly distributed in (0, 1), and 

X® = | - X® if 0 < X® < i, 

X® = \ + X® if i < X® < i, 

(9 32) X® = X® - \ if \ < X® < |, 

X® = f - X® if | < X® < 1, 

we have r = k = 0, ft(r) = 0, ft(r) = 1, ft(/c) - t^, ft(«, r) = 0, and hence 

p(i, k) -h> 0. 
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(f) Non-parametric tests of independence Suppose that the random variables 
X (1) , X (V have a continuous joint d.f F(x w , x (2) ), and we want to test the 
hypothesis Ho that X ^ and X^ ^ are independent, that is, that 
F(x (1 \ x <2) ) = F(x w , °o) F(oo ) v (V ). 

The distribution of any statistic involving only the ranks of the variables 
does not depend on the d.f. of the population when H 0 is true. For this reason 
several r ank order statistics, among them the difference sign correlation t and 
the rank correlation k', have been suggested for testing independence 
From the preceding results we can obtain the asymptotic power functions of 
the tests of independence based on t and k'. If Ho is true, we have E{ t ) = r = 0, 
and the critical region of size « of the f-test may be defined by 1 1 | > c n , where 
c n is the smallest number satisfying the inequality 

(9.33) P{\t\> c n \H 0 } < e. 

By Theorem 7.2 and (9.18) we may write c„ = 2X n /3%/n» where X n tends to a 
positive constant X depending on e. 

Since <r 2 (i) = 0(n _I ), the power function 

P«(H) = P{ | f | > 2X n /3 Vn | H} 

tends to one as n —* <*> for any alternative hypothesis H with t(F) ^ 0. If, 
however, t = 0, we have lim P n (I-I) <1. If r = 0 and fi(r) < $, we have even 
lim P n (H) < <■, and with respect to these alternatives the test is biased m the 
limit. Thus, m the case of the distribution (9.20) we have even P n (Ii) 0. 
In this case there is a functional relationship between the variables, and the 
distribution must be considered as considerably different from the case of in¬ 
dependence. 

For the rank correlation test we have a similar result. If c n is the_smallest 
number satisfying P{ | k' | > c n | Ho] < e, wc have c n = \' n /y/n, where 
lim Xn = X, and the test is biased in the limit if n = 0 and fi(ic) < $. This is ful¬ 
filled in the case of the distribution (9.32), where fi(ic) = rg-. 

The question arises whether there exist non-parametric tests of independence 
which are unbiased or unbiased in the limit. This point will be discussed in a 
separate paper on tests of independence 

(g) Mann’s test against trend Let Yt , , Y n be n independent real-valued 

random variables, Y a having the continuous df. F a (y), (a = 1 , ■ ■ ■ , n)- 
The hypothesis of randomness, 

Hi : Fib) = ■ ■ = F n {y) 

is to be tested against the alternative hypothesis of a “downward trend,” 


H 2 : F^y) < Fi(y) < < F n (y)- 

H, B. Mann [17] has suggested a test of Hi against H 2 based on the number T 
of inequalities F a < Yp, where a < 0 We may write 


2 T - 


n(n — 1) 


E s(Y p - Y a ) = E a(a - 0)s(Y a - Y e ). 

a</3 a</3 


2 
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The C7-statistic 


t = { 4 T/n(n - 1 )} - 1 

is the same as (9.9) for the special case when one component is not a random 
variable 
Let 


t *s = s(a - p) J J s(yi - y 2 ) dF a (yi) dFp(yf) 

= »(« - ^){2 j Fp(y) dFM - l|. 

We have r a p = 0 if Hi is true and r a p < 0 if Hi is true. 

Since 


m = r n = -J— -7T £ Tap, 

n(n — 1) a<fi 

it follows that E{L) =0 under H x and E{1) <0 under Hi . 

Mann’s test against trend has the power function P n (H) = P(i < a n \ H], 
where a n is the largest number satisfying P{t < a n \Hi) < e. 

Since a n —> 0 and, by (5.18), <r 2 (<) = 0{n~ 3 ), it follows from Tchebycheff’s 
inequality that the test is consistent (that is, P n (Hf) —> 1) and hence unbiased 
in the limit. This has been shown by Mann who also gave sufficient conditions 
under which the test is unbiased for finite n. 

By Theorems 8 1 and 8 2 the distribution of ( t — r n )/c{t) is asymptotically 
normal if certain conditions are satisfied Since (8.2), (8.3) and (8.13) are ful¬ 
filled, either of the conditions (8.4) and (8.14) is sufficient 

(h) The coefficient of partial difference sign correlation. Consider a three- 
variate sample x x , ■ ■ • , x n ; x a = (mi 11 , x ( ff, x^ ) ), (a = 1, • ■ • , n). In a sim¬ 
ilar way as in section 9d we may form the set of the n(n — 1) triplets of differ¬ 
ence signs, 

(9.34) s^L 15 - xp 1 ^ ), s(x l a ] — x^), s(a;i 3> — Xp 3) ), 

(or ^ p; a, P = 1 , • • , n). 

We shall assume that all a; (1) ’s, a; (2) 's, and m (3!, s are different. Then the triplets 
(9.34) contain only two different numbers, +1 and —1. Hence the regression 
functions of the three-variate population (9 34) are linear. 

If tii j ti 3 , and ti 3 are the difference sign correlations of (s(a;L 1) — xp 1 '*), 
s(x ( a ] - X™)}, {sOc! 15 - .rj 1) ), s(x ( a 3} - Xp 3) )j and {s(xi 2) - Xp 2) ), s(x ( a 3) - Xp 3) ) ) 
respectively, we have for the coefficient tn 3 of partial correlation between 
s(a:i 1) — Xp 3 ) and s(a;^ 2) — Xp 2) ) with respect to s(a4 3) — Xjs a> ), 

fl2 — ll3 tl 3 
v/(I ~ tl 3 )(l — $>3) 


(9.35) 
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This measure of partial correlation has been suggested by Kendall [13] who 
gave an alternative definition of / 12 . 3 . 

If we have two independent three-dimensional random vectors 
Xi= (X( 15 , X{ 2) , Xf°) and X% = (X?\ XV') with the same continuous 
d.f. F(x a \ x m , s (3) ), the distribution of the difference signs s(X( 0 — XY' 1 ), 
(i = 1 , 2 , 3 ), has again linear regression functions, and wo may define the 
partial difference sign correlation 


112 ~ Tl3 723 

T15 ' 3 = V(i - iin)(i - 4) ’ 

where t,„ is the difference sign correlation of X (l> , X (3) . 

If fi 2 3 is a function of a random sample, and if 113 ^ 1, rlt t 6 1, the d.f. of 
\/n(k 2.3 - Ta 3 ) tends, by Theorem 7.5, so the normal d.f. with mean zero and 
variance 


= (1 - rLXl ~ t?,) { flM + ( 7l -XY M 

(na - 11212 a ) 1 

T ^ _ ^2 flll 23 j — * 


123 112 113 


113 — 112123 


i 2 fl(ll 2 , I13) — 2 a 

1 — 113 1 ~ 123 


2 l*l(ll 2 , 12 s) 


where 


, „ (123 ~ 112113 ) (iis — ruraa) , , „ 

+ 2 (1 - ii 2 3 )(l - rU) fl(Tl8 ’ r2s) 


f(i«) {$?(X | r M )} - r 2 v , 

£1 (i« , T oh) = E\$i(X | T t j)$i(X | r B h)} T-lJ Tgh j 


and, for instance (cf (9.13)), 


$i(k I m) = 1 - 2F(x (1) , «>, 00 ) - 2F(°o, a; (2> , ») -f- 4 F(x w ,x w , «). 


If 113 = isa = 0, we have 

<112.3 = 4^1(112), 

and Vn(ti 2 3 — 112 3) has the same limiting distribution as Vn(h 2 — na)- This is 
in particular the case when X a) , X <2) , X' 31 are independent 
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OPTIMUM CHARACTER OF THE SEQUENTIAL PROBABILITY RATIO 

TEST 

A. Wald and J. Wolfowitz 
Columbia University 

1. Summary. Let So be any sequential probability ratio test for deciding 
between two simple alternatives H 0 and Hi , and Si another test for the same 
purpose. We define (i, j = 0, 1): 

a t (S,) = probability, under S , of rejecting H t when it is true; 

E{ ( n ) = expected number of observations to reach a decision under test S, 
when the hypothesis Hi is true. (It is assumed that E\ (n) exists.) 

In this paper it is proved that, if 

«.(Si) < a, (So) (i = 0,1), 

it follows that 

(n) < E\ (n) (i = 0, 1), 

This means that of all tests with the same power the sequential probability ratio 
test requires on the average fewest observations. This result had been con¬ 
jectured earlier ([1], [2]) 

2. Introduction. Let p t (x), i = 0,1, denote two different probability density 
functions or (discrete) probability functions. (Throughout this paper the index 
i will always take the values 0, 1). Let X be a chance variable whose distribu¬ 
tion can only be either p 0 (x) or pi(x), but is otherwise unknown. It is required 
to decide between the hypotheses H a , Hi , where H, states that p t {x) is the dis¬ 
tribution of X, on the basis of n independent observations Xi , ■ • , x n on X, 
where h is a chance variable defined (finite) on almost every infinite sequence 

W = Xi , Xi , ■ • • 

ie., n is finite with probability one according to both po(x) and pi(x). The 
definition of n(«) together with the rule for deciding on H 0 or Hi constitute a 
sequential test 

A sequential probability ratio test is defined with the aid of two positive 
numbers, A* > 1, B* < 1, as follows: Write for brevity 

j 

Vv = n 

fc»i 

Then w = j if 


^ > A* or < B* 

Pa, 
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and 


If 


if 


B* <~ < A*, 

Pol 


ft < 3 - 


— > A*, the hypothesis Hi is accepted, 

Pdn 


"Dt - _ 

—- < B* the hypothesis H 0 is accepted 

PQn 

In this paper we limit consideration to sequential tests for which 25, (n) exists, 
where 25<(n) is the expected value of n when H t is true (i e., when p t (x) is the dis¬ 
tribution of X). It has been proved in [3] that all sequential probability ratio 
tests belong to this class. The purpose of the paper is to prove the result stated 
m the first section. Throughout the proof we shall find it convenient to 
assume that there is an a priori probability g,. that Hi is true G/o + gi = 1, we 
shall write g = (g 0 , gi)). We are aware of the fact that many statisticians 
believe that m most problems of practical importance either no a prion pro¬ 
bability distribution exists, or that even where it exists the statistical decision 
must be made in ignorance of it; m fact we share this view. Our introduction 
of the a priori probability distribution is a purely technical device for achieving 
the proof which has no bearing on statistical methodology, and the reader will 
verify that this is so. We shall always assume below that ^ 0, 1 
Let Wo, Wi, c be given positive numbers. We define 

R = ffo(Woao + cEo(n)) + £7i(Wiai + cEi(n)), 

and call R the average risk associated with a test S and a given g (obviously R 
is a function of both). We shall say that Hi is accepted when the decision is 
made that Pi{x) is the distribution of X. We shall say that H 0 is rejected when 
Hi is accepted, and vice versa. The reader may find it helpful to regard W, 
as a weight which measures the loss caused by rejecting Hi when it is true, c as 
the cost of a single observation, and R as the average loss associated with a given 
g and a test S. For mathematical purposes these are simply quantities which 
we manipulate in the course of the proof. 


3. Role of the probability ratio. Let g, W = (Wo, Wi), and c be fixed. Let 
S be a given sequential test, with R(S) the associated risk and n(u, S) the as¬ 
sociated “sample size” function. Let ,■■■,*„) be the “decision” function; 
this is a function which takes only the values 0 and 1, and such that, 
when *i is the sample point, the hypothesis with index f (wi, • • , x n ) is 

rejected. Define the following decision function ip(xi , • , x n ): ip = 0 when 

. x _ WlglPln 

WogoPon 
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is greater than 1, and <p = 1 when A < 1 When A = 1 , <p may be 0 
or 1 at pleasure 

It must be remembered that an actual decision function is a single-valued func¬ 
tion of (itx, • ■ • , x n ). We note, however, that 

a) the relevant properties of a test are not affected by changing the test on a 
set T of points u whose probability is zero according to both II 0 and Hi , i.e., 
changing the definition on T of n and/or of the decision function, leaves a 0 , 
ai, E 0 (n) and Ei(n) unaltered In particular, the average risk R remains un¬ 
changed. 

b) the set of points for which p an = pin = 0 and A is indeterminate, has prob¬ 
ability zero according to both H 0 and Hi. 

In view of the above we decide arbitrarily, in all sequential tests which we 
shall henceforth consider, to define n = j, and \j/ = 0, whenever p aj = p u = 0, 
and n 1, ■ • , (j — 1). By this arbitrary action R(S) will not be changed. 

Let now 

T WiQiPin 

-C'jfl | 1 

goPcm -r ffi Pm 

L n - cn + min (I 0n , L ln ). 

We have 

EL^n — '2(J l W iU-i 

where the operator E denotes the expected value with respect to the joint dis¬ 
tribution of Hi and (mi , • • • , .r»), i.e , E is the operator (joE a + gJSi . If now 
the event .{^(jS) ^ <p and A ^ lj has positive probability according to either 
Ho or Hi , we would have, for n = n(o>, S ), 

EL/ipn < EL^n 

Hence, if the decision function \p connected with the test S were replaced by the 
decision function <p, R would be decreased Since our object throughout this 
proof will be to make R as small as possible, we shall confine ourselves henceforth, 
except when the contrary is explicitly stated, to tests for which i p is the decision 
function. This will be assumed even if not explicitly stated 
The function <p has not yet been uniquely defined when A = 1 A definition 
convenient for later purposes will be given in the next section. R is the same 
for all definitions. 

We thus have that <p is a function only of A, or, what comes to the same thing 

when W is fixed, of r n = — Define 

Pon 


r, 


Ph 
Pto ’ 


j = 1 , 2 , 
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We shall now prove 

Lemma 1. Let g, W, and c be fixed. There exists a sequential test S* for which 
the average risk is a minimum Its sample size function n(u>, S*) can be defined 
by means of a properly chosen subset K of the non-negative half-line as follows. 
For any w consider the associated sequence 


n , n , • • • 

and lei j be the smallest integer for luhich r 3 t K Then n = j. The function n 
may be undefined on a set of points u whose probability according to H a and Hi is zero 
Let a = (ai, • , a d ) be any point in some finite d-dimensional Euclidean 

space, provided only that pod(a) and pu(a) are not both zero. Let b - and 

Poe [a) 

let 1(a) = cd + mm (L ad , Lu) Let D be any sequential test whatever for 
which 7i(«, D) > d for any whose first d coordinates are the same as those of 
a, and for which E(n | a, D) < where E(n | a, D) is the conditional expected 
value of n according to the test D under the condition that the first d coordinates 
of u are the same as those of a. For brevity let G represent the set of points <o 
which fulfill this last condition, i.e , that the first d coordinates of a> are the same 
as those of a. Finally, let E(L n \ a , D) be the conditional expected value of 
D„ according to D under the condition that w is in the set G We know that 
mm(Lod, L u ) depends only on u(a) = b 
Write 

v(a) = sup [/(a) — E(L n \ a, D)]. 

D 

Let a 0 = (ooi, ■ • ■ , aok) be any point such that 

pu(a) _ pife(ao) 

Pod(a) pok(a 0 ) ' 

Let Do be any sequential test whatever for which n(u>, Do) > k for any u whose 
first k coordinates, are the same as those of ao, and for which E(n | a 0 , D 0 ) < » 
Let 

v(a 0 ) = sup [l(a 0 ) - E(L n I Co ,D 0 )]. 

We shall prove that v(a) = v(a 0 ) Thus we shall be justified in writing 

y(b) = v(a) = v(af) 

Suppose, therefore that v(a) > r(flo)- Let Di be a test of the type D such 
that 

We now partially define another sequential test Dio of the type Do as follows: Let 


& — cl i j * ) cirf, j y\ j j yt j 
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be any sequence such that n(a, D x ) = d + t. Then for the sequence 

Uq = Oqi j * * ' j (Iql j Hi > ■ t Vt ) 

let n(a 0 , D w ) = k + t. The decision function associated with D 10 will be 
partially defined as follows: 

= <p(a). 

(The reader will observe that it may happen that '/'o(uo) ^ <p(a o)). Since Td(a) = 
r*(ao) it follows that 

1(a) - E(L n | a, DO = l(a 0 ) - E(L n \ a,, D la ) > ^ ±A a ° ) > „( 0o ), 

in violation of the definition of v(ao). A similar contradiction is obtained if 
v(a) < v(ao) Hence v(a) = v(a 0 ) as was stated above. 

We define II to consist of all numbers b rvhich are such that there exist points 
a with ra(a) = b, and for which 7 (b) < 0 . We shall now prove that the test S* 
defined m the statement of the lemma is such that R(S*) is a minimum. Recall 
that the aveiage lisle is the expected value of L n Let S be any other test. 
Let a* = (cii , • , at*) be any sequence such that either n(a *, S *) = d *, or 

n(a*, S) - d*, but n(a*, 5 t ) H n(a *, 8). We exclude the trivial case that the 
probability of the occurrence of such a sequence, under both Ho and H x , is zero. 
Let i>(a*) = b*. The sequence a* may be one of three types: 

1 ) 7 ( 6 *) < 0. Hence b* e K, n(a*, 8) > d*. It is more advantageous, from 
the point of view of diminishing the average risk, to terminate the sequential 
process at once, since E(L n | a*, S) > l(a + ). 

2 ) y(b*) = 0 . Hence b* « K, n(a*, S ) > d*. If l(a*) — E(L n | a *, S) - 0 , 
i.e., the supremum is actually attained by S, then, as far as the average risk is 
concerned, it makes no difference whether the sequential process is terminated 
with a* or continued according to S. If, however, l(a *) — E(L n \ a*, S ) < 0 , 
it is clearly disadvantageous to proceed according to S. It is impossible that 
l(a *) — E(L n | a*, S) > 0, since 7 (b*) = 0. 

3) 7 (b*) > 0. Hence b* 4 K, n(a*, S) = d*. Clearly it is more advantageous 
from the point of view of diminishing the average risk not to terminate the 
sequential process, but to continue with at least one more observation. After 
one more observation we are either in case 1 or 2 , where it is advantageous to 
terminate the sequential process, or again in case 3, where it is advantageous to 
take yet another observation. 

We conclude that R(S*) is a minimum, as was to be proved 

4. A fundamental lemma. Consider the complement of K with respect to 
the non-negative half-line, and from it delete all points b' for which there exists 
no point a m some d-dimensional Euclidean space such that r d (a) = b'. The 
point 1 is never to be considered as of the type of V, i.e., 1 is never to be deleted 
Designate the resulting set by K. 
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Our proof of the theorem to which this paper is devoted hinges on the follow¬ 
ing lemma: 

Lemma 2. Let W, g, c be fixed, and K be as defined above. There exist twoposi- 

tive numbers A and B, with B < < A, such that 

Wigi 

a) if b e K, then either b > A or b < B 
V) ifbtK, B < b < A. 

Two remarks may be made before proceeding with the proof: 

1) We may now complete the definition of tp for tests of the type of S*. The 

Wo g 0 

WiQi ' 


reader will recall that <p was not uniquely defined when X = ],i.e.,whenr„ = 

Lemma 2 shows that it is necessary to define tp(\) only when X = s K and 

Irigi 


X is therefore either A or B. We will define tp 


(Wot 

\W l£ 


i go \ n Wo go 

— as 0 or 1. according as ==— 
igj Wigi 

is A or B, and A ^ B. This is simply a convenient definition which will give 

uniqueness. When A = B = t K , the situation is completely trivial, and 

W 1 9i 

we may take <p — 0 arbitrarily 

2) If 1 t K the above lemma shows that the average risk is minimized (for 
fixed W, g, c, of course) by taking no observations at all. We have tp = 0 or 1 
according as 1 > A or 1 < B 

Proof op the lemma : Let h > be a point in K. We will prove that any 

Wig: 

point hi such that < h' < h, and such that there exists a point a' in some 
Wigi 

d'-dimensional Euclidean space for which i>(a') = h', is also in if In a similar 

way it can be shown that, if h 0 < is any point in K, any point ho such that 

Wigi 

ho < ho < , and such that there exists a point a’ Q in some <f'-dimensional 

Wigi 

Euclidean space for which ^(ao) = h'o, is also in K This will prove the 
lemma 

Let therefore h and hi be as above. Let S* be the sequential test based on 
K, with the decision function tp. Let a be a point in d-space such that rfia) = h. 
Since h e K we have y(h) >0 

We now wish to define partially another sequential test S, with a decision 
function which may be different from 0 , as follows: Let a' be defined as above. 
Write 

a — (oti, * *" 1 af) 
a' = ( a u ■ ■ ■ » a 'd')- 
Let 

Ch = j ’ * * ) ^d 7 ^/l 7 * l Vt 
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be any sequence such that n(d, S*) = d + l. Then for the sequence 
d ! = a[ , , a' d ‘ , iji , • ■ ,jj t 

let n(d', S) = d' + l The decision function \p associated with 3 will be partially 
defined as follows: 

' P(a') = <p(a). 

Clearly 

(4.1) E,(n | a, S*) — d = E^n \ a', 3) — d' (i - 0, 1) 

and 

(4 2) EU | a, $*) = E^ \ a', 5) (t = 0, 1). 

Furthermore, we have 
1(a) - E(L n | a, S*) 


(4 3) = {TP 0 + cd - cE 0 (n \ a, S*) - T7„[l - E a (<p | a, S *)]} 


+ g^Th {cd ~ cElin 1 a ’ S * ] “ Wl El ^ 1 a > 
Since y(h) > 0, and since 

(4.4) cd - cEi(n | a, S*) - W l E 1 (v | a, S *) < 0, 

we must have 

(4 5) Wo Wed- cE 0 (n \ a, S*) - Wol 1 - E»( 9 | a, 5*)] > 0. 

From h' < h it follows that 


^ go , gih' . gih 

(Jo + g\h (jo W gi h go + Qi h g 0 + gi h 

Relations (4.1), (4 2), (4 4), (4.5) and (4.6) imply that the value of the right hand 
member of (4.3) is increased by replacing <p, h, a, S* and d by h', a', 3, and 
d', respectively This proves our lemma. 

If there are values which r, cannot assume the pair B, A might not be unique 
For convenience we shall define A find B uniquely in the manner described below. 
We will always adhere to this definition thereafter 
We shall first define y(h) for all positive h in a manner consistent with the 
previous definition, which defined y(h) only for those values of h which could be 
assumed by r, Let h be any positive number and D(h) be any sequential test 
with the following properties 


(4.7) 


there exists a set Q(h) of positive numbers such that n = j 
if and only if the j-th member of the sequence 

hr i, hrs , kr 3 , • • 
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is the first element of the sequence to be in Q(h) 

(4 8 ) E,(n | D(h)) < a> (i = 0, 1). 

We define, for h > ^° go , 

WxQx 


(4 9) y(h | D(h )) = — g - {W 0 E 0 ( V \ D(h)) - cE 0 (n \ D ( h ))} 

g o + giii 

+ ~f^h t -Wi&b 1 D(h)) - cEi(n | Dm, 
(4.10) y(h) = sup y{h | D(h)) 

DQi ) 


Trr ~ 

with a corresponding definition for h < —— . Thus 7 (/i) is defined for all posi- 

Wigx * 

tive h, This definition coincides with the previous definition whenever the latter 
is applicable It is true that the supremum operation m (4 10) is limited to 
tests which depend only on the probability ratio, as (4 7 ) implies, but the argu¬ 
ment of Lemma 1 shows that this limitation does not diminish the supremum 

(It might appear that, for h = , 7 (/i) is not uniquely defined. We shall 

shortly see that this is not the case.) 

The quantity y(h) depends, of course, on g 0 and g x To put this in evidence, 
we shall also write y(h, go, g x ). One can easily verify that 


y(h, go 



go gih \ 

go + gih ’ g 0 + hj 


More generally, for any posilivc values h and h', we have y(h, g a , gi) = 
y(h\ g 0 , ji), where g a and g x are suitable functions of ffo, 0 i, h, and h'. Thus, if h 
is not an admissible value of the probability ratio and h' is any admissible value, 
we can interpret the value of y(h, go , g x ) as the value of 7 coiresponding to h 1 and 
some properly chosen a priori probabilities p 0 and g x 


We now define A as the greatest lower bound of all points h > for which 


y(h) < 0 We define B as the least upper bound of all points h < 


W i 0 i 
Wo go 
T'Fi0i 


for which 


7 (h) <0 If 7 (h) < 0 for all h the above definition implies A = B = — . 

o'101 

The argument of Lemma 2 shows that 7 (h) is monotomcally increasing in the 

interval (B, g ° \ j anc j that 7 (A) is monotomcally decreasing in the mter- 
d'l 01/ 


val 


I'l 0 00 
W m 


We shall now define a sequential test S*(h) for every positive h The decision 
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function of S*(h) will be <p, and n = j if and only if the j-th member 
of the sequence 

y(hri), y(hr t ), y(hr s ), 


is the first element to be < 0. We see that 

(4.11) y(h) = y(h | S*Qi)) 

for all h. Incidentally, this proves that y(h) was uniquely defined at 
_ Wo go 
Wigi 

We shall now prove 

Lemma 3. The function y(h) has the following properties 


a) It is continuous for all h. 


b) 7(A) = 7 (B) = o 

c) y(h) < 0 for li > A or < B 


Only a) and c) require proof, since b) is a trivial consequence of a) and the 
definition of A and B. 


Let h be any point except 


Wo go 

W l0l 


, and let z be any point in a neighborhood of h. 


Within a neighborhood of h both Eo(n | S*(z)) and Ei(n \ S*(z )) arc bounded. 
Let A be an arbitrarily given, positive number. Let h' and h" be any two points 
in a sufficiently small neighborhood of h, to be described shortly. We proceed 
as in the argument of Lemma 2, with the present h' corresponding to h of Lemma 
2, the present h" corresponding to h' of Lemma 2, and with S*(h') corresponding 


to of Lemma 2. Since —~— and —~— ■ are continuous functions of z, 

9 o + ffi2 ffo + ffiZ 

and since E a (n \ S*(z)) and Ei(n \ S*(z)) are bounded functions of z, we con¬ 
clude that, when the neighborhood of h is sufficiently small, 


y{h") > y(h') - A. 

Reversing the roles of h' and h" we obtain that in this neighborhood 

t on > yon - a , 

and conclude that 


I y(h') — y(h") | < A. 

Since A was arbitrary, this implies the continuity of y(h) everywhere, except 
Wo go 


perhaps at h = 


Wigi 


To deal with the point h ~ , proceed as follows: Using the above argu- 

W i gi 

ment and the definition (4 9), (4,10), we prove that y(h) is continuous on the right 
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at h — . Using, at the point h = ? - ,the definition of y!h I D{h)) for 

vVigi IViffi 1 


W l£7l 


(4.12) 


y(h\D(h)) = — ff —{-TF OJ Bo(l - v \D(h)) - cE,(n\D(h))} 

Qo T" Ql II 

+ -^-r WE^l - v I DQi)) - cEfn | D(h))}, 
go -r y i a 


(4.10) and (4.11), we prove that y(h) is continuous on the left at h = ~~ ■ 

Wi gi 

This proves a). 

To prove c), we proceed as follows: Suppose for ho > A we had y(ho) = 0 
Since 

{ -WiEfa | S*(ho)) - cEi(n \ S*(h 0 ))\ < 0, 
we would have that 

{W 0 Eo(<p | S*(ho)) - cE 0 (n \ S*%))} > 0. 


An argument like that of Lemma 2 would then show that y(h) >0 for 


„ TUoSo 


Wig i 


h < h 0 . This, however, is impossible, because it is a violation of the definition 
of A 

In a similar way we prove that if h < B, y(h) < 0 This proves c) and with 
it the lemma. 


5. The behavior of A and B. Lemma 4. Let g and c he fixed Then A and 
B are continuous functions of Wo and Wi 

Proof: It will be sufficient to prove that A is continuous, the proof for B 
being similar Suppose A > B. Let hi and hi be such that 

a) B < hi <! A < hi ; 

b) — hi < A for an arbitrary positive A. 

We write y(h) temporarily as y(h, Wo , Wi) in order to exhibit the dependence 
on Wo and W\ . Then 

y(^i , Wo, Wi) > 0, 
yQh , Wo, Wi) < 0. 

It follows from (4.9) that y(h | D(h )) is continuous in Wo, Wi, uniformly in 
D(h). Hence y(h, W 0 , Wi) = sup y (h | D(h)) is also continuous in Wo, Wi 

nth) 

Hence, for A Wo and AWi sufficiently small, 

Y (hi , Wo + AWo , Wi + AWi) > 0, 
y(lh , Wo + AWo, Wi + AWi) < 0. 
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Therefore 

h < A(Wo + A TFo, Wi + AWi) < h , 

which proves continuity, since A was arbitrary. 

If £ - r we t a i ce j h < < 7 l2 7 l2 — h < A, and by a similar 

W 1 Q 1 Wigi 

argument show that 

y(h, Wo + A Wo, Wi + AT'Fi) < 0; 
y(h , Wo + ATTo, Wi + ATFi) < 0. 


Thus 

hi < B(Wo + A Wo , Wi + ATFi) < A (TFo + ATFo, TFi + ATFi) < ho. 


This proves the lemma 

Lemma 5 Let g, c, and Wi be fixed. A is strictly monotonic in Wo ■ 4s TFo 
approaches 0, A approaches 0, as Wo approaches +<x>, A also approaches + “ 

Proof- Since A > , A —> + °° as Wo —> + If TFo < c no reduc- 

TFigi 

tion in average risk could compensate for taking even a single observation, no 
matter what the value of h. Hence y(h) < 0 for all h when Wo < c, so that 

A = B. Since B < , B —> 0 as TF 0 —*► 0. Hence A -> 0 as TF 0 -» 0. 

~ WiQi 

It is evident from (4 9) that y{h \ D(h)) is non-decreasmg with increasing TFo 
(everything else fixed) Hence also 

y(h) = sup y (jh | D(h)), 

DM 


is non-decreasing with increasing TFo, for fixed h > 


TFo go 
TFigi 


and fixed TFi. For a 


positive A sufficiently small and for any h such that A < h < A + A, we have 
that 


E 0 (fp | S*(h)) > 0. 

Hence, for such h, y(h, W 0 , TFi) is strictly monotonically increasing with increas¬ 
ing TFo Therefore A is (strictly) monotonically inci easing with increasing TFo. 

We now define the function TF 0 (TFi , 8) of the two positive arguments Wi, 
8 so that 


A(TF 0 (TFi, 8), IFi) = 5. 


By Lemma 5 such a function exists and is single-valued. 


6. Properties of the function TF 0 (TFi, 8) Lemma 6, TF 0 (TFi , 8) is con¬ 
tinuous m TFi 
Proof: Let 

hm TFiiv - TFi, 

N-* « 
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and suppose that the sequence [Wo(Ww , 5)} did not converge. Suppose Wo 
and Wo were two distinct limit points of this sequence. From the continuity 
of A (Lemma 4) it would follow that 

A (Wo , Wi) = A (Wo , Wi) 

This, however, violates Lemma 5. The only remaining possibility to be con¬ 
sidered is that 

lim Wo(Ww, 5) = °o 


If that were the case, then, since A > , it would follow that A °° , 

Wigi ’ 

in violation of the fact that 
Lemma 7. We have, for fixed 8, 


lim Wo(Wi) = 0, 

RQ-.0 

lim Wq(Wi) = w 

TP'l-t'lM 


Proof: If, for small Wi , WfiWi) were bounded below by a positive number, 

then, since A > we cou id make A arbitrarily large by taking Wi 

sufficiently small, in violation of the fact that 4 = J To prove the second half 
of the lemma, assume that Wo(Wi) is bounded above as Wi —> °° . Then 

B ( < ] will approach zero as Wi —* co Let h be fixed so that B < h < 8, 

\ Wigi/ 


Consider the totality of points oj for which there exists an integer n*(u) such that: 


hr n * < B , 


B < hr, <5, j < n*. 

The conditional expected value of m this totality, when H 0 is true, maybe 
made arbitrarily large by making B sufficiently small. Hence, when Wi is 
sufficiently large, for fixed but arbitrary h < S, the optimum procedure from the 
point of minimizing the average risk is to reject H 0 at once without taking any 
more observations This, however, contradicts the fact that h < 5, and proves 
the lemma 

Lemma 8, We have, for fixed 8 > 1, 

lim B(Wo(Wi,8),Wi) = 5; 

un-,0 

lim B(Wo(Wi,8),Wi) = 0. 

IP 1~»0Q 


lim IP 0 (IPi) = 0 


Proof: By Lemma 7, 
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When, for fixed c, both Wo and Wi are small enough, then, no matter what the 
value of h, 7 (h) < 0. Hence A = B, which proves the first half of the lemma. 

Let now [Win] be a sequence such that lim Win = “ Let 5 > 1 For the 
sake of brevity we write -BCFur) instead of 

5(Wo(Fi. v 5), Win). 

Suppose that, for sufficiently large N, B(W w ) is bounded below by a positive 
number. Hence, for sufficiently large N, the probability of rejecting Hi when 
it it is true is bounded below by a positive number. Moreover, since 

B < < A. it follows that, for N sufficiently large, is bounded above 

W 1 Q 1 Wwgi 

and below by positive constants Thus, for large N the average risk of the test 
defined by B(Wut), S, is greater than ugJViw, where u is a positive constant 
which does not depend on N Moreover, from the definition of B(W\ N ), this 
risk is a minimum 

Let e be a positive number such that + 1 ^ < ^ for all N sufficiently 

large. Let Vi , Vi, with 0 < Vi < 1 < Vo, be two constants such that, for the 
sequential probability ratio test determined by them, both no and 07 are < e. 
Of course E 6 n and Hi«, are finite and determined by the test. For this test the 
average risk is less than 

e(po Wow + {7i Fiw) + ego E 0 n, A- uji Ein 

U — 

< 2 01 Wur + c{/o Ho n + efiu Hi n 

. 3u ™ 

< y 9i Win , 

for Win large enough. This however contradicts the fact that the minimum 
risk is > ugAViN , and proves the lemma. 

7. Proof of the theorem. Let a given sequential probability ratio test So be 
defined by B*, A *; B* < 1 < A*. Let a^So) be the probability, according to 
So , of rejecting H, when it is true. Let c be fixed. 

By Lemma 4, B is a continuous function of Wo and TFi, Let 8 — A* m 
Lemma 8 . Then there exists a pair Wo , Wt , with Wo = Fo(TFi, A*), such that 

A (Wo , Fi) = A*, 

B(Wo , Wi) = B*. 

Hence the average risk 

Zg>lW iai (So) + cE](n)], 

l 

corresponding to the sequential test So is a minimum. 
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Now let Si be any other test for deciding between Ha and Hi and such that 
a»(iSi) < a t (So), and E\(n) exists (i — 1, 2). 

Then 

£ ff. \W, a^Sa) + cE%)} < E ff. [Wi «.(&) + cE](n)]. 

t l 

Since a t (Si) < a^So), we have 

Efctffri) <T,(hE\(n). 

x x 

Now g 0 , Qi were arbitrarily chosen (subject, of course, to the obvious restric¬ 
tions). Hence it must be that 

E\{n) < E\(n). 

This, however, is the desired result. 
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LIMITING DISTRIBUTION OF A ROOT OF A DETERMINANTAL 

EQUATION 

By D. N, Nan da 

Institute of Statistics, University of North Carolina 

1. Summary. The exact distribution of a root of a determinantal equation 
when the roots are arranged in a monotonic order was obtained by S. N. Roy 
[3] in 1943 A different method for deriving the distiibution of any one of these 
roots has been described by the author in [2]. In the present paper the limit¬ 
ing forms of these distributions are obtained. This paper gives a method by 
which the limiting distributions can be obtained without undergoing an inordi¬ 
nate amount of mathematical labor. 

2. Introduction. If x = || x tJ || and x* = || x*, || are two p-vaiiate sample 
matrices with ni and n 2 degrees of freedom and S = 11 xx' 11 /n x and #* = 11 x*x*' || 
/n 2 are the covariance matrices which under the null hypothesis are independent 
estimates of the same population covariance matrix, then the joint distribution 
of the roots of the determinantal equation | A — 6(A -f- B) \ = 0, where A = 
niS and B = nzS*, was obtained by Hsu [1] in 1939 and is 


( 1 ) R'(l 


n n a - 6^ n & - 

(0 < 5 , < 0,-1 < < 6 x < i ), 

where l = min (p, ni), m = | p - 7ii | + 1 and v = ni — p -f 1- The distribu¬ 
tion density may be expressed as 

(2) R{1, m, n ) = c(l, m, n ) II K h (i - 8i) n ] II & - 9>), 

t-i »<> 

where m = p/2 — 1 and n = v/2 — 1. 

3. Method. Let 6, = j\/n in (2). The joint distribution reduces to 

(3) iim+w—572 n [f?(l “ £./»)"] n (f» - L) dfi ■ • ■ dh , 

* V Xnnl \ ^ y 

(0 < f, < fz-x ■ • • < fi <n). 

340 
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As n tends to infinity the limit of (3) is 


(4) 


IlfT IIO-*- f,)e- sr *df, 

>=i i o 


(0 < f, < f,~l •■<?!<*). 


The value of K(l, m ) is 

c(i, m, ft) 
lim !+(m+!(i-l )/2 

n —»oo W* 


i/2 


= lun 


s r ( 


Z + 2m + 2n + % + 2 


nr^Jr^ 1 ) 


in 


nr 


r(t/2)'7i !+!m+ia_1)/2 

l + 2m + 2» + i + 2^ 


n r ( 2m + » + 1 ) r(t/2) 


lim 


2n + £ + —l) /2 

' 71 


nr 


By using Stirling’s approximation for gamma functions and after simplifica¬ 
tion we get 

1 / I + 2m+2tt + t + 2N 


lim 


irf 

i=i \ 


nr 

1=1 


2n + i + 1 


= 1 


,lm+l(l+l)/2 


Hence 


K(l, m) = ~r /, 

nrf 

1=1 \ 




2m + i + 1 


r(*/2) 


(5) 


and therefore 

K( 2, m) = 2 2m+1 /T(2m + 2), 

K( 3, m) = 2 2m+3 /[r(m + l)T(2m + 3)], 

JSr(4, m) = 2 4m+5 /[r(2m + 2)r(2m + 4)], 

If(5, m) = 2 4m+ °/[3r(m + l)r(2m + 3)r(2m + 5)]. 

Let 

(6) Gi, m (x) = k(i, m) [ n f? n (r< - n * 

It can easily be observed that 

Qi, m (x) = Pr (fx < x) = lim Pr {nOi < x) = lim Pr ( 0i < = ). 

n—. n->oo \ 'v 
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Thus the limiting form of the distribution of the largest root can be obtained by 
integrating the density given in (4) according to the method described by the 
author in [2]. It is, however, observed that the mathematical labor is reduced 
considerably by adopting the following method. 

Referring to the results of the exact distribution of the largest root given in 
[2], let Fi, m , n {x) = (0, l, l - 1, • ■ ■ ,l,x, to, n), thus F 2 ,„, n (x) = (0, 2,1, x, in, n) 
and Fz,m,n(x) = (0, 3, 2, 1, x\ m, n). Then c(l, m, n)Fi, w , n (x ) is the probability 
that none of the roots 0 t exceeds x, and is thus the cumulative distribution func¬ 
tion of the greatest root. We shall show that lim c(l, m , n)F itmtn (x/n) = 

n—+oo 

The reader is, however, asked to refer to [2] for the detailed explana¬ 
tion of the notations and certain mathematical operations used in this paper. 

4. Limiting distribution of the largest root. We will derive the distribution 
of the largest root for l = 2 and 3 by the two methods, A straightforward 
method null be named A. A second method, which proves to be very simple 
and easy will be called Method B. 

(a) I = 2 

(i) Method A. We have, 

Pr (nfli < ®) = Gi, m {x) = K{2, m) f (fifsHfi - h)e~ Ul+h ' > dfi df 2 

By using the method described in [2],. we have 

GUx) = K( 2, m) ( f - f f?<f f2 ■ f? + V ri dfc dJ , 

i. J 0<fa<ti< 1 J <s<h<t 2 <x ) 

= IF( 2, m){rS' x (y, 1, X, TO + 1) - TV{ 0,1, y, m -I- 1)}, 

where 

T: fi g{y) = f g(y) • y m e~ v dy, 

Ja 

and 

(7)(a, l,b;m + 1) = f d? = (a m+1 e~ a - b m+ \e*) + (to + l)(a, 1, b, m). 

Ja 

Hence, 

<J a .m(x) - K( 2, m)n- x {y m+1 e- v - x m+1 e* + (to + 1 )(y, 1, s; to) + if 
- (m 4-1)(0,1, y, to)], 

= K(2, m)T:' x [2y m+l e' v - r”V], 
as To‘ |I [(y, 1, x; to) - (0,1, y; to)] = 0. 
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Therefore 


lim Pr Mi < re) = G 2 , m (x) = A(2, m) 


2 j[ ifV'df, - z’-'V’ f yV 1 ' dy\ . 


When s - &.«(*) = 1; hence 77(2, m) = 2 2m+1 /r(2rn + 2), the value given 

in (5). 

Now we shall derive the result by Method B. 

(ii) Method B. 

Ft, m ,nfa) = (0, 2,1, x] m, n) = - 1 - 

in + n + 2 


• { 2 1; y^ 1 - y? n+1 d y - * ro+1 a - *r +1 f y m a - »r <&}, 


a result given in [2]. 

Replacing x by x/n, we get 

(0, 2, 1, x/n, to, n ) = —- - 1 — 
m + n + 2 


j / , x /Ti %/n \ 

• 12 j y 2n+1 ( 1 - j/) 2 " +1 dy - (x/n) m+ \l - x/n) n+1 jf' y m (l - y) n dy\ , 


also, letting y — u/n, we have 


(0, 2, 1, x/n, m, n ) = 


(to + n + 2)n 2 ’ 


2 [ u 2m+1 ( 1 - n/n) 2 " +1 d w - a; ,7,+1 (i _ a;/ n )"+i f u *»(i _ „/„)», 
•'0 Jo 


Inn Pr (n0i < x) = Pr (ft < x/n ) = lim c(2, to, ?i)( 0, 2,1, ®/?i; m, ?i), 


T(2to + 2) ( Jo 

which is the same as (8), obtained by Method A, 
(b) l = 3. 

(i) Method A. We have 


2 JJ u 2m+1 e~ 2u du - x m+1 e~ x f u m e~ u d«j , 


Pr(n5i < x) = Gi.n/x) — K.( 3, to) f 

Jo 

= K(3, to) f (fr 


0 <f3 <f2 <fl <2! 


nr?n(r, - r>“ Sf, n dr. 


(hfrhVe 


zn —(fi+fg+fa) 


{1, 2, 3} rffi dfo dfo , 


where {1, 2, 3} = fifofl, 2} + {3, Ij + ^{2, 3}, as given in [2], 
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= 77(3, m) If 


'0 <fa <fs<fi <* J 0<fi<fs<f2<^ 


+ f fc)- +l <T (ri+w f 1, 2} «fo df, #A, 

J 0 <f2<fl<f»<^ J 

= 7C(3,m)|3 1 S 1 '%, 2,1, *; m + 1) 

+ TV{ 0, 2, ?/, 1, a; m + 1) + 2f*( 0, 2,1, y; m + 1)}, 

where 

(a, 2, 1, b,m) = f (tif 2 ) w (f 1 -f 2 )e“ (ri+r2 > df 3 df ,. 

i"2 <fl < 6 

We have already obtained 

(0, 2, 1, *; m) = &. m (z)/77(2, m) = {2 jj dy - z m+1 <r Jj y m e' v dA 


as given in (8). 

We also need the following results which are obtained by the method de¬ 
scribed for l = 2. 

(10) (a, 2,1, b; m) = 1 2 J u lm+1 e~ 2u du - ( a m+l e~ a + b m+1 e~ h ) f u m e~ u duj , 


(11) (a, 2, 5,1, c; m) = j J u m e~ u du - a m+1 e~ a | uV“ du 

- c m+1 <T 0 t u w e““ du 

Jo 

Using these results we have 


u m+1 e~ u du 


Gz, m (x) = K(3,m)T%' x \2 u 2m+ V 2,< du - (/ H V I/ + z m+2 0 f 

[ Jv Jy 


- m+2 ~ v f u m+1 e~ u du + x m+2 e“* f V u mH V“ du + 2 f 
Jo Jo Jo 


u 2 " 1+ V 2 “ du 


- ?/ m+ V' u ffl+1 e“ u du . 


Simplifying we get 


(12) lun Pr(n0i < x) = (? 3 , m (a;) = K{3, m) < 2 f u 2m4 V 2 “ du f u ro e'“ , 

n-+cc Jq Jq 

2 f . Wl+1—w j f 2m+2 —’2il t m+2 —£ 

/w e du I u e du — x e 

Jq Jq 


u 2n+ V 2u du - 3 m+1 e^ 


jf u m e u du |, 


where Z(3, m) = 2 2m+3 /[r(2m + l)r(2m + 3)]. 
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(ii) Method B. 

Fs,m,n (®) = (o, 3, 2 , 1 , x) m, n) 

= m n 3 [ 2 (°> *5 2m + 3, 2n + 1)(0, 1, t; m, n) 

— 2 ( 0 , 1 , x; m + 1 , n)(0, 1 , x ; 2 m + 2 , 2 n + 1 ) 

- ( 0 , X] m + 2 , n + 1 )( 0 , 2 , 1 , a; m, n)], 

a result given in [ 2 ]. 

Replacing x by x/n and putting u/n for the variate y of integration, we have, 
Fa,m, n (x) = (0, 3, 2, 1, x/n- m, n) = ——— - —— - 


m + n -j- 3 


{£* f y2m+3(1 ~ " ** f ™ “ «/»)" ** - ^ 

r x m+2/n i \n-hl 

I u m+ ‘( 1 - u/n) B du / u 2m+2 (l - u/n) 2n+1 du - ~ 

o Jo n 3m+,1 (m + n + 2) 

[2j[ u 2m+1 ( 1 - it/n) 2n+1 du - x m+1 (l - x/n) B+1 jf u m ( 1 - u/n) B du j. 

Hence 

lim Pr (n 0 x < x) = lim Pr ( 0 X < - ) = lim c(3, m, n)F z<mtli (x) 

n — »0O M—>00 y 7 ?/ J W—>00 

= K( 3, m) (2 f u 2mH V 2 “ clu [*u m e- u du - 2 f u 2ro+ V 2,t du f u mH V u du 
Jo Jo Jo Jo 

- s-^V J^2 fj u 2 m+ V 2 “ du - x m+1 e~ x jT uV du j , 


_ 771+2 —X 

— X G I 


where 


X(3, m) = 2 2m+8 /[r(m + l)r(2m + 3)]. 

This result is the same as (12) obtained by Method A. 

We have thus shown that Method B is applicable for obtaining the limiting 
forms of the distribution of the largest root and that it is much simpler as com¬ 
pared to the straightforward method called Method A here 
The limiting distributions for the largest root foi l = 4 and 5 are listed below, 
(c) l = 4. 

lim Pr (n0i < x) = lim Pr (q- l < = G^ m {x) 


= IC( 4, m) 12 jj u 2m ‘ 

2J fl X 

2m-h2 —2u /„ mH 2 —as I m —u 

' u e du — x e / u e 
_ 0 

+ 2 I 


U lm+S e - 2 U du Gkjd 


i U -2 f u 2 m+ V 2u du 

A ( 2 , m) Jo 

|V.-d» + (« + a)^W] 

1 — 2 u ; , ^ 2 ,OT+l(^) _ ^,111+3^—1 Ga >m (x') ( 

e A(2,m + 1) A(3, m)J ’ 


u 2m+3 e~ 2u du 
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where 


K{i, m) = 2 4m+6 /[r( 2 m + 2 )T( 2 m + 4 )]. 

(d) l = 5. 

lim Pr (nfc < %) = Iim Pr (e 1 < -) = G 6 , m (x) 

n-*oo \ n) 


u im+, e-^du Gs ' mix) 


k( 3 


M) 2 r 

, m) Jo 


„, 2 m +8 —in , 

u e du 


= K(5, m) ^2 jf 

■[2 jf u 2m+ V 2 “ du j (*u m e~ u du- 2 fu 2m+3 e - 

■ f u m+1 e~ u du - x m+i e~ x —'-"fo + ( m + 3) G a , m (x) "I 
J ° KM + [Jn + 6) K(^rt)\ 


u*»*e*'du \2 fu 2 m+i c~ iu du fuTe^du 
J 0 Jo 


where 


+ 2 / 

JO 

-2 f u 2 m+ 3 e~ iu du r u m+ 2 e~ u du - x m+ 3 e~ x 
J 0 Jo 

• T 2 [ rf m+i e~ iu du - x n+i e~ x f V <T“ du + (m + 2) . 

1 J « K{2,m)\ 

~ 2 />* <T 2w du - t™ + 4 <f * l 

7C(3, m + l)J 0 .ft( 4 , m )j > 


if(5, m) = 2 4ro+ 7[3r(m + l)r(2m + 3)r(2m + 5)]. 


6 . Limitang distribution of the smallest root. It was shown in [2] that the 
exact distribution of the smallest root can be obtained by using the relation 

Pr (0 2 < x) = 1 - p r (ft < 1 _ x | Vj M ) 

This relation, however, does not help in obtaining the limiting distribution of 
the smallest root from that of the largest root. The limiting distribution of the 
smallest root can be obtained by the method illustrated below 
(a) l = 2 

The exact distribution of the smallest root 0 2 can be expressed as 
Pr (ft, < x) = 0 (2, m, n){(0, 2, 1 , a;; m, n ) + (0, 2, x, 1 , z; m , n)j, 
where 2 = 1 . Replacing x by x/n, we get 

Pr (& < x/n) = c(2, m, n) {(0, 2,1, x/n-, m, n) + (0, 2, x/n, 1, 2 ; m, n )}, 
where 

(0, 2, 1, x/n, m, n) = [2 jf V +1 (l - y) 2n+1 dy 
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and 

(0, 2, x/n, 1, z; m, n) = - — * — g [(0, i/»; m + 1, n + 1) 

'l y m 0- ~ W)’ d V - (0, a; m + 1, n + 1) j[*V(l ~ 2/)" ty], 

as obtained from (6) of [2]. 

The limiting distribution of 0 2 is 

(13) Pr (0a < x/n) = lim c(2,m, n){(0, 2, n) + (0,2, x/n, l,z; m, n)). 

n—> oo 

Putting w/n for ?/, the variate of integration and allowing n to tend to infinity, 
we have 

lim c(2, m, n) (0, 2, 1, x/n; m, n ) 

ft-+oo 

= K{2 , m) {2 J*u 2m+1 C~ 2u du - x m+1 e~ x J*u m e~ u duj, 

and 

lim c(2, m, n){0 , 2, z/n, 1, 2 , m, n) = 7f(2, m)x m+1 e -1 [ u m e~ u du 

n-*oo Jq 

= Z(2, m)ar +1 e“*r(m + 1). 

Substituting these results in (13) we have 
lim Pr (nd 2 < x) = lim Pr (02 < */ft) 

n-+co n —* co 

= I£(2, m) 1 2 J\ 2m+1 e~ iu du - x m+1 e~ x jfV e~ u du 

+ x m+1 e~* T(m + 1)|, 

where 

K(2, m) = 2 2m+1 /[r(2m + 2)]. 

(b) l = 3. 

The exact distribution of the smallest root can be expressed as 
Pr (0 3 < x) = c(3, m, n)[(0, 3, 2, 1, x\m, n ) + (0, 3, 2, as, 1, z, m, n) 

+ (0, 3, x, 2, 1, z; m, ft)], 

where s = 1. 

Replacing a: by x/n and allowing ft to tend to infinity we have 
Pr (ft0 3 < x) = lim c(3, m, n)[(0, 3, 2, 1, z/ft, m, n ) - 

71 —* CO 

+ (0, 3, 2, x/n, 1, z, m, ft) + (0, 3, x/n, 2, 1, z; m, ft)] 


(14) 
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The values of these components on the right hand side of the above equation 
are given below. 

lim c(3, m, n) (0, 3, 2, 1, x/n; m, n ) = Gi, m (x), given by (12), 


lim c(3, m, n)( 0, 3, 2, x/n, 1 ,z,m, n) 


- X n+ \~ x 


m+1 —* 

— x e 


= K( 3, m) |£V g" u du ^2 jf 

+2 e~ x f*u n+1 e~ u duj - x m+2 e 

<T X j f\ m e~ u duj + x m+2 e^ l 


u 2m+ \~ u du 


'[*£ 


u 2m+l e~ 2u du 


+ x m+2 er u m+1 e~ a du u m e _u du 


-2 [ u m+1 e' u du [ u 

Jx JQ 

and 

lim c(3, m, n)( 0, 3, x/n, 2,1,z; m, n) = K( 3, m) j f u m e~ u du | 2 f 

TL —+ QC I JQ Jx 


u 2m+2 e~ 2u du 


2m+3 —2 u j 

u e du 


r” T 

j u e du 


u 2m+2 e~ 2u du 


- x m+ V* fV+V'dJ - x m+2 e~ x \2 f *u 2m+1 e~ 2u du - x m+1 e' x fu m e 

+ x m+2 <T [ X u m+I e- U du r u m e~ u du - 2 f u m+1 e~ u du fu 2w+2 e~ 2u du) . 
Jo J x Jq J x J 

Substituting in (14) we have, 

lim Fr (n8 * < x) = (2 2 " 1+3 /[r(m + l)r(2m + 3)]) 

n-+oo 

•{2 f u Sm+ 3 e~ 2u du fu n e~ u du + 2 [*u 2m+2 e~ 2u du fV 

^ Jo Jo Jo J* 

- 2 [ u m+1 e~ u du fu 2m+2 e~ 2u du - 2 (V +1 <T u dy f°u 2m+2 e~ 2u d 
Jo Jo Jo J a 

o „"»+2 —a: f 2 m-H — 2 u j —a? f t 2m+l —2u j 

— e u e au — 2x e u e du 
Jq Jo 

+ 2?bH- 3 —2s ( f „ m —u j \ f m —u 

x e [jueau + Jue 


2m -|-3 —2s / J m —u 


+ x 6 


lim Pr {nd 3 < x) = 2 5m+8 /[r(m + l)r(2m + 3)] 


we (iu + u e du 


jr(2?ra H - 4) Z' m -u J , r> f . 2m-hl -2u , [ m -it , 

'1— 22 m+ 4 — J 0 u e ou + 2 J u e du J u e du 

- 2T(m + 2) fV" H V 2u dw - 2 fV + V u dw f« 2m+2 e -2 * 

Jo Jq J s 

" ^ - * m+2 e~* £ u 2m+1 e' 2u du + T(m + l)x ! 

+ x 2m+3 e~ 2x fu M e~ u d«l. 


2ffi+3 2s 
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Thus we have seen that this method can be used for obtaining the limiting 
distribution of the smallest root for any value of l 


6 . Limiting distribution of any intermediate root. The above method can 
also be used for obtaining the limiting distribution of any intermediate root. 
We shall give the distribution of 0 2 for l = 3 We have 


(16) Pr ( 0 2 < x) — e(3, m, n)f (0, 3, 2 , 1 , x\m, n) + ( 0 , 3, 2 , x, 1 , z, m, a)), 
where 2 = 1 . 

The lim,i_ B c(3, m, n)(0, 3, 2 , 1 , %/n, m, n) and lim„_ M c(3, m, n)( 0, 3, 2 , %/n, 
1 , 2 ; in, n) are given by ( 12 ) and (15) respectively. Substituting these lesults 
m (16) and simplifying we get 


lim Pr (n 0 2 < x) = 

n-+ oo 


r(m + l)T( 2 m + 3) 



du 


fu 2m+i 

JQ 


— 2 u 

e 


du — 2 



du [ u 2m+2 e 2,1 du 
J o 


+ .r ”' 1 ' 2 e~ x 


- 4x m+2 <T I* u 2mH e~ 2u du + 2x 2m+> e~ 2x f u m e~“ 

Jo Jo 

f u M+1 e~ u du f u m e~' l du- u m e~ u du f u w e 
J x J 0 J x Jo 


m-J-1 —u 



Oi, 

2 2m+3 f t x 

hm Pr(« 0 j < *) = r(m + l)r( 2 m + 3 ) \ 2r(m + 1} j, 

- 2 r(m + 2 ) f u 2 m+ V 2u da - 4 z w+ V* f M 2 " ,+ 1 e" 2 ’' du + 2 x 2 m+ V 21 

Jo Jo 

■ f e _ “ du + X m+2 e~ x [ r u m+1 e~ u du f u'V“ du 
Jo L J* Jo 


J u m e u du J u"' +I e u du j . 


Thus the limiting distribution of any intermediate root can be obtained by the 
above method 


7. Further problems. The limiting distiibution of the largest root is found 
to be very helpful in obtaining the distribution of the sum of roots when m = 0 . 
This condition implies that when the results are applied to canonical correlations 
the numbers of variates m the two sets differ by unity. The distributions for 
the sum of roots have been derived under the above condition for l = 2, 3 and 4 
and the results are being presented in the next paper of this series 

8 . Acknowledgements. I am extremely thankful to Drs. P. L, Hsu and 
Harold Hotelling for guidance and help in this research 
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ON A SOURCE OF DOWNWARD BIAS IN THE ANALYSIS OF VARIANCE 

AND COVARIANCE 

By William G. Madow 

Institute of Statistics, University of North Carolina 

1. Summary. It is shown that if, in the analysis of variance, the experiments 
are not in a state of statistical control due to variations in the true means, then 
the test will have a downward bias. The power function of the analysis ol var¬ 
iance test is obtained when this downward bias is present. 

2. Introduction. To introduce the discussion of this bias let us consider the 
generalized Student’s hypothesis. 

Let 3 / 1 , ■ ■ • , ykN be normally and independently distributed with variance 
o- 2 , and let the expected value of y„ , be a w } Then the generalized Student’s 
hypothesis is 

(Null hypothesis) a,„ = a 

and the class of alternative hypotheses against which the null hypothesis is 
tested is 

(Glass A) a it = a,. 

From the statement of the null hypothesis and the alternatives of Class A it 
follows that both the null hypothesis and the alternatives of Class A require that 

(1.1) a»i — • ■ ■ — . 

Since our experiments are rarely in such perfect statistical control that (1.1) 
holds whether or not the null hypothesis is true, it becomes reasonable to in¬ 
vestigate the existing F test when instead of the alternatives to the null hypoth¬ 
esis being of Class A, they are simply Class B: 

(Class B) Equation (1.1) is false for at least one value of i 
Furthermore, for many practical purposes we would prefer to test the average 
null hypothesis: 

(Average null hypothesis) a* = a, 

where Nd t = an. + • • • + a lAr and fed = di + • • • + d*, instead of the null 
hypothesis, the alternatives to the average null hypothesis being of Class C. 
(Class C) The a rJ can have any values such that not all the u, equal d. 

1 Throughout this paper the letter i will assume all integral valueB from 1 to Jc, the letters 
/i, v will assume all integral values from 1 to N, the letters 7 ,^ will assume all integral values 
from 1 to m, the letter a will assume all integral values from ni + + n T -i + 1 to 

»i + • ■ + Uy, (n 0 = 0 ), and cn , « 2 will assume all integral values from 0 to « 
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The F-test of the null hypothesis against the alternatives of Class A is, as is 
well known, 

k(N - 1) £ - y ) 2 

(fc - 1 £ (y,„ - y$) 

\,V 


where Ny l = y, x + • • + and ky = y x + • • + Vk ■ To answer the ques¬ 
tions formulated above concerning the F-test when the average null hypothesis 
or the alternatives of classes B or C are true, we must then calculate the dis¬ 
tribution of F under these various conditions This is done in Section 3. 

A somewhat informal means of obtaining the conclusions is that of studying 
F itself. Taking the expected values of the numerator and denominator of F 
and defining 

N £ (a, - a) 2 

0 * = _ 1 _ 

(ft - ly 

** = hn - iy 5 (a " ~ s,)2 

we obtain as the ratio of the two expected values 

p _ l + 4>i 
1 + 02 

It is well known that, in general, the larger the value of N the more closely will F 
approximate F. From this fact it is eaisy to see why if the null hypothesis is true, 
then F ~ 1, whereas if the null hypothesis is false but an alternative of Class A 
is true then 

F ~ 1 + <t>\ > 1 


so that large values of F become more likely than if the null hypothesis were true. 
However, if an alternative of Class B is true then 




i + 0i 
1 + 02 


so that if 0 i < 02 , smaller values of F occur more frequently than indicated 
by the null hypothesis. Thus we would tend to accept the null hypothesis more 
frequently than desired when it is false. Even when the null hypothesis is false 
so that 01 > 0, the values of F will tend to be less if 02 > 0 than if 02 = 0 
whether or not 4>l < 4>t Not only is the probability of an error of the first kind 
less than the value e we may have previously selected, but also the power of the 
test is less than would be indicated by Tang's tables [1]. The lack of statisti¬ 
cal control represented by variation of expected values within a class has the 
effect of making it less likely than the standard F-test indicates that the null 
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hypothesis will be rejected whether it be true or false. Furthermore, even for 
relatively low values of <£ 2 , the reductions in the probabilities of rejection may 
be over 40 per cent as indicated by some examples given below. 

If the average null hypothesis is true but (1 1) is false it follows that 


JJ 1 r*o 


1 

! + *»’ 


so that the full effect of the downward bias occurs in that case. Thus in cases 
where statistical control is lacking, to test the average null hypothesis by the la¬ 
test may well result in accepting the hypothesis when it is false. If the null 
hypothesis is rejected, however, then we can expect that the differences among 
the true means are even larger than indicated by Tang’s tables. 

To illustrate, it is shown in Section 4 that if k = 5 and N = 7, then the prob¬ 
ability of rejecting the average null hypothesis when it is true, but (1.1) is false 
will not be the preassigned .05 but something less than .03 if 4>\ > .05. Fur¬ 
thermore, if 4 >2 > 07, then the power of the F tests for this example will be re¬ 
duced by at least 40 per cent whatever the value of . 

The conclusions reached above remain valid for the analysis of variance and 
covariance in general. In the general case however, the value of the average 
null hypothesis m simplifying the analysis may be considerably reduced since 
the parameter <f>l no longer vanishes when the average null hypothesis is true 
For example, if Ey „ = P,x ,, and if the average null hypothesis is p = 0, where 
Nfi = Pi + • + Pn , then upon calculating 

(Z P*xlY 

,2 _ _ v __ 

01 — 2 VJ 

v 

we see that 4n will not vanish in general if P vanishes 

Although as shown above the average null hypothesis may not have too great 
importance in the case of regression, yet if the “variance between treatments” 
is a function of arithmetic means of the random variables as in the “pure” 
analysis of variance the average null hypothesis may well be very useful Simple 
examples of this are provided by the randomized block, Latin square, and similar 
designs. 

The distributions that we shall need are given in Section 3. The inequalities 
on the basis of which the bias is demonstrated are obtained in Section 4. 

It would be highly desirable to have Tang’s tables extended so that they might 
provide the answers to the questions raised by this source of bias In the ab¬ 
sence of such extensions the inequalities of Section 4 may give some rough 
idea, but these inequalities are not sharp enough. 


3. The calculation of the distributions. The following theorem was proved, 
although not explicitly stated, as part of an earlier note [2] (Note the change 
from x t to y l as the notation for the random variable.) 
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Theorem 1, Let yi, 
variance a 2 arid, means ai, 


j 2 In of ranks ni, 


, Vn be normally and independently distributed with 
, a N and let qi , ■ ■ , q m be quadratic forms 

it = 11 A? y»y> 

M.V 

• ■ , n m . Then , if an orthogonal transformation 

Vv = H c >7i z |i 


exists such that 


= HA, 


it follows that the random variables q y /A are independently distributed in x ' 2 dis¬ 
tributions with degrees of freedom ni, •• ,n m and parameters Xi , ■ ■ • , \ m , where 

\ ~ — Y n {y) n n - Eqy - 

— c) 2 L-j Ctyv tty Up — -t—— ——— . 

iifj ^ 2cr 2 

Various conditions for the existence of an orthogonal transformation satisfy¬ 
ing (2.1) of Theorem 1 have been given Among these are: 

1 . Cochraris [3] condition. If H Qy = H vl then a necessary and sufficient 

y v 

condition for the existence of an orthogonal transformation satisfying (2 1 ) 
is Hny = N. 

y 

2 Craig’s [4] condition. If A y denotes the matrix (aj y) ) then a necessary 
and sufficient condition for the existence of an orthogonal transformation satis¬ 
fying (2.1) is A y A„ = 5 y „A y where t> yv is the null matrix if 7 t 6 y and the identity 
matrix if 7 = rj. 

3. Linear Hypothesis condition. (Kolodziejczyk [5]) If X be the likelihood 
ratio test of a linear hypothesis and if E 2 — 1 — K !ti , then E 2 — qi/(qi + q 2 ) 
and an orthogonal transformation exists satisfying ( 2 . 1 ) with m = 2 . 

To summarize some results obtained by Tang [1], let us state 

Theorem 2. If yff and xl are independently distributed in distributions with 
ni and m degrees of freedom and parameters Xi and X 2 , and if 


,2 , >2 > 

Xi + X2 


then the probability density of E 2 is 

p = p(E 2 1 Xi , Xi, m , m) = e'- A ‘- Xl (5 , ) l "‘ /2) - 1 ( 1 - EY i,2) ~ l 


t-i (Hi ”f~ Wl | | ^ 

Xi hi i l - 2 -r “1 + «2 


£*1 ! £*2 1 r 1 


+ ai ) T ( -—b «2 


CE 2 ru - e 2 t 
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By assigning certain values to V and X 2 we obtain the following special cases 
of (2 2) 

Vi = PC® 2 I Xi, 0, ni , m) = e _>ll (fi 2 ) <ril/2>_1 (l - eY 2 ®- 1 


(2 3) 


Vo = p(E 2 10, X 2 , »ii, ti 2 ) 


(2.4) 


r «t(s+»+.) 


( j b 2 )“ 1 


e _ ^2(^i2)(n2/ 2 )- 1 (i _ ^(n 2 /2)-l 


•L 


'” ,r (l) r (l + ») 


(1 - E 2 )“ 2 


(2.5) Po = p(E 2 10, 0, fti, n 2 ) 



^2^(nj/2)-lQ _ j^(7i 2 /2)-l 


It is noted that (2.3) is Tang’s distribution (112) upon which the calculations 
of his tables were based. To see this we need only make the correspondence 


This paper 

Tang 

Xi 

X 

ni , no 

/l , f 2 

a\ 

i 


We define e to be the probability of an error of the first kind. Tang obtained 
the critical values E] of E 2 by requiring that 



= « say .01 or .05. 


Then he calculated 

Pu = [ pi(E 2 | Xi, 0, ni , n 2 ) dE 2 
Jo 

using the values of E\ obtained above. Hence 1 — Pn is the power of the test. 
If, however, Xi = 0 but X 2 ^ 0, then to find 
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we could make the transformation G 2 = 1 — E~ and find 
Pm = p((? 2 10, As, m, n>) dff 2 . 

Jo 

It is easy to verify that 

p(G 2 1 0 , As, n a , n 5 ) = pi(.E 2 | As, 0, ns, ni) 

if we put (? in place of E 2 in the latter density. It follows that to calculate P m 
it would be sufficient to have full tables of Tang’s distribution since 

fl-i* 

Pm = / PiC® | As, 0, n 2 , ni) dZ?“. 

Tang’s tables are not however sufficiently extensive. Furthermore, tables of 
(2 2) are also necessary. As yet these tables do not exist. However, some useful 
conclusions can be diawn from the inequalities obtained in the following section. 

First, however, let us evaluate ni, n 2 , Ai and A 2 for the generalized Student’s 
hypothesis discussed in the introduction. It is easy to see that ?ij = 7c — 1 
and n 2 = k(N — 1). To evaluate Ai and A 2 we note from Theorem 1 that we 
only need substitute Ey tl for y tl in and <? 2 where 

Si = A - Z (Vi - y? 

x 

?2 = (Vtv V\) • 

\,v 

Upon making these substitutions we obtain 

Xl = 2 ^ 

^2 = ^2 (uiv — a,) 2 . 

La t, v 

Thus the various hypotheses concerning the influence the distribution of F 
or E 2 = 1/(1 + Fni/m ) by affecting the values of Ai and A 2 . 


4. Limits of the values of p. It follows readily from (2,2) that, 

r / ni + ws \ 

p __ \ 2 / (C 71 1 /2) 1 ^ 2J2^(t» 2 /2)—1 


(3.1) 


' fIlA v (v*\ 
\ 2 / \ 2 / 




A? 1 A° 2 
m! a 2 ! 


(H 2 )“‘( 1 - P7T 2 (7, 


o,u; 


T ^1 + ^ i , \-n ( n l\ t, /W\ 

U—2— + ai + tt y r Uj w 

4 + “‘M? + ") r H4 


where 
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Now if a > 0, b > 0, and j is an integer > 1, we have 


‘+-;X i+ 4- 2 


:)'■( 


1+ b + 2( ] -l)) < \ 1+ b) ■ 


Hence, it follows that 


1 < C aifl2 < 


Wi + n^y 1 (iii + n.2 + 2o:i\“ J 


Substituting we see that 

_ -Xi-Xs 

Poe e 


< p < Po e 


( 3 ‘ 2 ) f W 2 /ni + iii\ f2X 2 (1 - S 2 )”) 

■ exp r® (-sr ) “ p L —*— J + 

and 

(3.3) <v< p, exp X 2 + X,(l - L 2 ) 

Let 2 n t fa = X,, i = 1, 2. 


+ X 2 (1 - 5*)(^L±”»' 


t 2 \ /Wl + n 2 


Theorem 3 Let« 


. r 

Jb 2 


po dE 2 so that t is the -probability of an error of the first 


kind. Then, for all values of fa 


6 > j P2 dE 2 


and if E 2 > ni/(ni + n 2 ), it follows that 


[i 

> e exp{ —2ni4>\ + 2$ 2 (1 — E])(jh + n 2 )j > / ps dE 2 > «e . 


Furthermore, for all values of<t>l 


(3.7) ‘ 


f„Pi dE 2 > f p dE 2 , 

fa] fa-, 

and if E 2 > (m + 2)/(ni + 712 ), it follows that 

f PidEi > exp{ — 2nv<j>l + 2^(1 — E 2 )(ni + ?i 2 ) 2f >\} [ pi< 

Je 2 , 


> ,VdE > e 


,2 . 0 -2»2*i 


L s “®‘- 


Finally, if 7 can assume the two values 0 and 2, it follows that if 
^ .2. -logs_ 


then if 7 = 0 , 


> 2(E*(m + n 2 ) - (m + 7 )) > °’ 


p 2 
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and if y = 2 
(3.10) 


f] V dE 2 < a [[ Pl r/F 2 . 


Proof. To prove (3.4) and (3.6) it is only necessary to follow Daly’s [6] 
procedure. 2 Since 

exp { —2n2<f>l + 2^j(l — E 2 )(ni + nf) + y4>\\ 
and 


exp {— ri 2 <j>l E 1 } 

are decreasing functions of E 2 , and 

exp {— 2rhf>l + 2 ^ 2(1 — E 2 )(ri\ + nf) + ytfrl} 


if 


< 1 


]g* > ni + y 

H\ “l - 712 

the inequalities (3.5) and (3.7) follow immediately from (3.2) and (3.3). Finally 

exp {— 2n*A + 2tpl(l — E 2 ){ri\ + nf) + y<j>\ J < 5 < 1 

if (3.8) is true, so that (3 9) and (3.10) follow, 

From (3 8), (3.9) and (3.10) wc can calculate either a lower limit for the bias, 
if we know , or the upper limit that fa can have if we wish the bias to be not 
greater than some given amount. Thus these limits do not answer the important 
question of what is a value such that if <f> 2 < $ then the bias is less than (1 - 
S)e. They only provide a value <f>' of such that if fa > <j>' then the bias is at 
least (1 — 5)e. 

If, for example, 5=5 and ?ij = 1 as in the case of Students’ ratio, we have 
if y = 0 


4>l > 


693 

2(712F 2 - 1) 


and if e = 05, then E 2 decreases steadily from .903 if n 2 = 2, to .063 if rh = 60’ 
and the corresponding lower limits of 4>\ decrease from .43 to .12. Thus, if 
4>l > 43 or .12 in these two cases, it follows that the probability of rejecting the 
average null hypothesis will be not 05 but something less than .025. 

If 5 = .6 and n 1 = 4, n» = 30 then we can evaluate the lower limit of <pl for 
the example given in the introduction finding. 


^ > 2(.279) (34) - 8 ‘° 5 


implies a downward bias of at least 40 per cent of .05. Also, if <j>\ > .07 then for 


2 The procedure followed is given in [6] on pp, 4, 5, equations (2.2) through Lemma 1. 
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any value of in the power of the analysis of variance test is reduced at least 40 
percent, 

5. Conclusions. The rather sharp effects of a moderate lack of statistical 
control on the probabilities associated with the F-test indicates the importance 
of testing for statistical control outside of the industrial applications now made, 
Furthermore, it would seem advisable to investigate tests and designs that are 
less sensitive to the lack of control than is the F-test 

REFERENCES 

[1] P C Tang, “The power function of the analysis of variance tests with tables and illus¬ 

trations of their use,” Statistical Research Memoirs , Vol 2 (1938), pp, 126-157, 

[2] William G Madow, “The distribution of quadratic forms in non-central normal ian- 

domvariables,” AnnalsofMathm Stated 9 (1940),pp 100-104, 

[3] W G, Cochran, “The distribution of quadiatic forms in a normal system, with applica¬ 

tions to the analysis of covariance,” Cambridge Phil Soc Pm Vol. 30 (1934), 
pp 178-191 

[4] A, T, Craig, "Note on the independence of certain quadiatic forms,” Annals of Math 

StaL, Vol 14 (1943), pp 195-197 

15] S. Kolodziejczyic, “On the important class of statistical hypotheses," Biometnh, 
Vol 27 (1935), pp 161-190. 

[6] J, F. Daly, “On the unbiased character of likelihood-ratio tests for independence in 
normal systems,” Annals of Math Stat , Vol, 11 (1940), pp 1-33, The proce¬ 
dure followed is given on pp 4,5, equations (2 2) through Lemma 1 



MIXTURE OF DISTRIBUTIONS 

By Herbert Robbins 

Department of Mathematical Statistics, University of North Carolina 

1. Summary. Mixtures of measures or distiibuLions occur frequently in the 
theory and applications of probability and statistics. In the simplest case it 
may, for example, be reasonable to assume that one is dealing with the mixture 
in given proportions of a finite number of normal populations with different 
moans or variances, The mixture parameter may also be denumerably infinite, 
as in the theory of sums of a random number of random variables, or continuous, 
as in the compound Poisson distribution. 

The operation of Lebesguc-Stieltjes integration, J fix) dp, is linear with 

respect to both integrand f(x) and measure p. The first type of linearity has as 
its continuous analog the theorem of Fubini on interchange of order of integra¬ 
tion; the second type of linearity has a corresponding continuous analog which 
is of importance whenevei one deals with mixtures of measures or distributions, 
and which forms the subject of the present paper Other treatments of the 
same subject have been given ([1], [2]; see also [3], [4]) but it is hoped that the 
discussion given here will be useful to the mathematical statistician. 

A general measure theoretic form of the fundamental theorem is given m 
Section 2, and in Section 3 the theorem is formulated in terms of finite dimen¬ 
sional spaces and distribution functions. The operation of convolution as an 
example of mixture is treated briefly in Section 4, while Section 5 is devoted to. 
random sampling from a mixed population. 

We shall refer to Theory of the Integral by S Saks (second edition, Warszawa, 
1937) as [tS'], and the Mathematical Methods of Statistics by H Cramer (Prince¬ 
ton, 1946) as [C]. 

2. Mixture of measures in general. Let X(Y) be a space with points x (y) 
and let :£(§)) be a <r-field of subsets of X(F) Let v be a measure on f). Let 
liy be for a, c {v)y a measure on •£, such that Py(S) is for every S m T a measurable 
(§)) function of y . Define for every S m X, 

(1) p(S) = f fly (S') dv. 

Jy 

Theorem 1. pis a measure on $. If v(Y) = py(X) = 1, then p(X) = 1. 
Proof. Clear. 

Theorem 2 If f(x) is any non-negative or non-positive function measurable 
$) then the function 

g(y) = I f(%) dyy 

360 


( 2 ) 



is measurable (§)), and 

(3) 
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/ S(x) dp = I g(y) dv. 

Jy 

Proof. First let f 0 (x) be any non-negative simple function [5, p. 7] of the 
form 


(4) foix) — {fti, Si •, • • ■ ] a*, &i >} 

where the S , are disj amt sets in £ such that X — X t Si and the a t are non-negative 
constants Then 


(5) Qa{y) = [ fo(x) dfi v = X <hPyiSi) 

Jx 1 

is a non-negative function measurable (§)), and from (1) it follows that each side 
of (3) is equal to Xi chpiS,) Hence the theorem holds in this case. 

Next let fix) be any non-negative function measurable (£); then [S, p. 14] 
there exists a sequence f n (x) of simple functions such that for every m, 

(3) 0 < fi(z) < fzix) < ••• ; lim/ n (:i;) = fix). 

n—>oo 

Setting 

(7) Sn(y) = f fn(x) dfiy, g(y) = [ f(x) dp v , 

Jx Jx 


it follows from the theorem of monotone convergence [S, p. 28] and from the 
preceding paragraph that 


(8) / fix) dp = lim / f n (x) dp. = lim / g n (y) dv, 

J X 71 —> oO J X 7l->00 JY 

(9) (jiy ) = lim / /„(x) dp „ = lim g n {y). 

n—>oo « X tv—> co 

From (6) and (9) it follows that for a.e. (v)y, 

(10) 0 < gi{y) < g-iiy) < ■■■ , lim g n (y ) = g(ij). 

tv —>oo 

Hence g(y) is measurable (2)), and from the theorem of monotone convergence, 

(11) f g(y) dv = lim f g n (y ) dv. 

Jy n-*ooJy 

Equation (3) now follows from (8) and (11). 

By passing from f(x') to —f(x) we establish (3) when fix) is any non-positive 
function measurable (£), This completes the proof of Theorem 2 
If fix) is an arbitrary function measurable (3£) we define 


(12) f + ix) = 


'fix) if fix) > 0 
! 0 otherwise 


/"(*) 


fix) if f(x) < 0 
0 otherwise 
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SO that 

(13) f(x) = f(x) + f(x) 

is the sum of two functions measurable (36) of constant sign. By Theorem 2 the 
functions 

(14) gi(y) = [ f + (x) dy u , gM = [ f~(x) dy u 

J x 

are measurable (§)) and 

(15) 0 < £/ + (a) dy = £ gfiy) dv < =o, 

(16) 0 > J f~( x) dy = £ £7a(2/) dv > - 

The integral [ fix) dy exists if and only if at least one of the two quantities (15) 


and (16) is finite [S, p 20]. 

Theorem 3. A necessary and sufficient condition that 



is that at least one of the two quantities (15) md (16) he finite. 

Proof. By the remark preceding Theorem 3 the condition is clearly necessary. 
Now suppose, e.g., that (15) is finite;we must show that (17) holds. By hypoth¬ 
esis, 

(18) J f + (x)dy<v, J f(x) dy - j^f + (x) dy + dy. 

From (18) and (15) it follows that 0 < gi(y) < » for a.e. (i >)y ; hence 

(19) J fix) dy u = J f + (x ) dy v + J* f~( x ) dy u = giiy) + gt(y) 
exists for a.e. (v)y. From the finiteness of (15) it follows that 

(20) f igfiy) + giiy)) dv = [ gfiy) dv + j g 2 (y) dv 

Jy Jy *>y 

exists. Hence from (19), the integral 

(21) | £ f{x) dyv | dv = £ igfiy) + g 2 (y)) dv 

exists. Equation (17) now follows from (21), (20), (15), and (18). This com¬ 
pletes the proof of Theorem 3 

Corollary 1. If y(X) < co, and if fix) is hounded from above or from below, 
then both sides of (17) exist and the equality holds. 
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Proof. If, say, /(t) < C < a>,then 

0 < f f + (x) dfi < C • M (X) < co, 
and the result follows from Theorem 3 

We shall now show by an example that the existence and even the finiteness of 
the right side of (17) does not imply the existence of the left side. 

Let X = Y = {1, 2, • ■ • , n, • • ■ j and let X(3)) consist of all sub,sets of X(Y). 
Let v be Lhe measure which assigns mass c n to n, where the c n are positive con¬ 
stants such that X)? c n = 1. Let /i n assign the mass l/2n to each of the points 
1, 2, • ■ , 2 n. Let /Or) be such that/(l) = b lt f( 2) = 3) = b 2 ,/(4) 

= — b 2 , ■ • • where the b n are positive constants. Then 

[ f(x) dn n = 0 (?i = 1, 2, • ), 

Jx 

so that 

l { f x f(r)d M n}d» = 0 . 

The measure n defined by (1) assigns to each n a positive value ju(n) given by 
Kl) = ft(2) = Ci- (2) -1 + c 2 * (2-2) -1 + c s -(2-3r 1 + ■■- 
g(3) = m( 4) - cr(2 , 2)~ 1 + c s -(2-3r 1 + ••• 


where n(X) — Xm(«0 = XX = 1. 

i i 

Now fix the b n and c„ in such a way that 

bi-n(l) + 1)2 ■ n(3) + &3 ■ m(5) + • • ■ = 00 • 


Then 


£ / + ( t ) dn = ~f x f 00 = “> 

so that the left side of (17) does not exist, even though v(Y) = n v (X) = n(X) = 
1 and the right side of (17) exists and is equal to zero. 


3. A restatement of the preceding results in the form most useful in prob¬ 
ability theory. Let x = (xi, ■ ■ ■ , x n ) be a point in the n-dimensional Euclidean 
space B n , and let B n denote the v-field of Borel sets m B n . Let S x denote the 
half-open interval in R n consisting of all pomts (wi, • • ■ , w„) m B, n satisfying the 
inequalities 

(22) Wi < X! , ■ • , w„ < x„ ; 
then if g is any probability measure on B n the function 

( 23 ) F(x) = »(S X ) 
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is the distribution function corresponding to p Conversely, if F(x) is any dis¬ 
tribution function in R n [C, p. 80] there is a unique probability measure p on B n 
such that (23) holds. As a matter of notation we write for any Borel measurable 

fix), 

(24) f f(x) dp = f f(x) clF(t) 

JR n J-w 

provided the integral on the left exists. 

Now let y = (yi, • , IJm) be a point in R m , let G(y) be a distribution function, 
and let v denote the corresponding probability measure on B m . Let F(x,y) 
be for a e ( v)y a distribution function m x, and for every x a Borel measurable 
function of y, and let py be the corresponding probability measure on B n . 
Theorem 4 The function 

(25) H(x) = [ " F(x, y) dG(y ) 


is a distribution function mR n . Let p denote the corresponding probability measure 
on B n . Then for any S in B n , py(S) is a Borel measurable function of y and 

(26) p(S) = f p„(S) dG(y). 


Proof. Let C denote the class of all Borel sets S m R n such that py(S) is a 
Borel measurable function of y. We shall show that C is anormal class [S, p 83]. 

oo 

(i) If Si, S 2 , • • • is a sequence of disjoint sets in C and if S = XX , then 


P„(S) = p y 


?*■)“? 


My (Sri) 


is a convergent series of Borel measurable functions and is therefore itself a Borel 
measurable function. 

oo 

(n) If Si 3 S* Z) ■ • is a decreasing sequence of sets in C and if S = DX, 
then 


Mi/($) — My 



lim p, ,{S n ) 

n —>oo 


is the limit of a sequence of Borel measurable functions and is therefore a Borel 
measurable function. 

Hence C is a normal class But G contains every interval S x , for Py(S x ) = 
Fix, y ) was assumed to be a Borel measurable function of y for every x, It 
follows [S, p 85] that C = B n 

It now follows from Theorem 1 that the set function p(S) defined by (26) 
is a probability measure on B n . The corresponding distribution function is the 
function H(x) defined by (25). Thus Theorem 4 is proved 
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Let f(x) = f + (%) + /*"(*) be any Borel measurable function. Then from Theo¬ 
rem 2, the integrals 


(27) 


(28) 


[ f + (x)dH(x) = f f + (x)dj r F(.x,y)dG(y) 

J— O0 j— 00 l J— 00 

= L„{L f+(x) d ^ F ^y))dG(y), 

f f~(x) dH(z) = f f~(x)dj[ F(x, y) dO(y) 

J — ad J — oo 1 J— oo 

- CO f n 00 

= ) f~(x) d x Fix, y) }> dQ(y) 

J—to i */— oo 


exist. The following theorem is an immediate consequence of Theorem 3 and 
Corollary 1 

Theorem 5. A necessary and sufficient condition that 
(29) J f(x) d x ^J F(x, y) dG(y) | = / |/ f(x) d x F(x, y)\ dG(y) 


ts that the left side of (29) exist 1 , i e that at least one of the quantities (27) and (28) 
be finite. This will be true in particular if f(x) is bounded from above or from below 


4. The operation of convolution. An example of the general mixture (25) 
of distribution functions is the operation of convolution: if F(x), G(x) are two 
distribution functions m Ri then F{x,y ) = F{x — y ) satisfies the conditions of 
Theorem 4, so that 


(30) 



F(x - y) dG(y ) 


is also a distribution function in Ri, denoted by 

(31) H(x) = F(x) * G(x). 

Corresponding to any distribution function F(x) in Ri is the characteristic 
function 


(32) 


<p{t) 



which in turn uniquely determines F(x) [C, p. 93J. 

Theorem 6. Let F(x), G(x), H(x) be distribution functions in Ri and let <p\(t), 
(t),<p(t) be the corresponding charaderisticfunciions. Then 

(33) H(x) = F(x) * G(x) 
if and only if 

(34) 


<p(t) = <Pi (O’ ¥>a(0- 
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Proof. Assume (33) holds. Since |e’ to | < 1 we have from Theorem 5, 
<pit) = [ e lx djj_“ Fix - y) dG(y ) j 


= £_{£_ e' lx d,F(x - y) j dGiy) 

= jf e' tv | J* e' Ux ~ v) d x F(x - y) j dGiy) 

= f e' lv |£“ e" 9 dFiw) | dGiy) = f»i«) • «(*), 


The converse implication now follows from the fact that the characteristic func¬ 
tion of a distribution determines the latter uniquely. 

The importance of the operation * in probability theory arises from the fact 
that if X, Y are independent random variables with respective distribution func¬ 
tions Fix), G(x), and if Z = X + Y, then the distribution function H(x) of Z 
satisfies (33), since for any value of a, 

Hid) = P[X + Y < a] = Jf dF(x) dGiy) 

(36) I+i,Sa 

= f If dFix) \ dGiij) = ( Fia - y) dGiy) - A(cs) * G(a), 

J—oo [Jx<a—y J J -co 


the evaluation of the double integral by an iterated integral following from 
Fubini’s theorem [S, pp. 76-88] However, (33) may hold without X, Y being 
independent, and Theorem 6 shows that (34) will then hold also, and con¬ 
versely 

An example where Hix) = Fix) * Gix) without X, Y being independent 
has been given by Cram6r [C, p. 317, exercise 2] We shall give another Let 
points 0, A, ■ • , F m the ix, i/)-plane be defined as follows. 

0 = (0, 0), A = (1, 1), B = (1/2, 1), C = (0, 1/2), D = (1, 0), 

E = (1, 1/2), F = (1/2,0). 


Let/(r, y) have the value 2 inside the quadrilateral OABC and the triangle DEF, 
and 0 elsewhere. Then if fix, y) is the joint frequency function of X, Y it is 
easily seen that X and Y have uniform distributions on the intervals 0 < x < 1, 
0 < y < 1 respectively and that Z = X + Y has the triangular distribution 
given by (33), although X and Y are not independent. 

It would be interesting to know what distribution functions F{x) are such that 
if X, Y, Z = X + Y are random variables with the distribution functions Fix), 
F(x), Fix) * Fix) respectively, then X and Y are necessarily independent. A 
rather trivial example of such a distribution function is the step function F{x) 
with jumps of % at the points x = 0 and x = 1. It can be shown (oral commu¬ 
nication by W. Hoeffdmg), in generalization of Crambr’s example, that no abso- 
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lutely continuous distribution function (e.g. the normal distribution function) 
has this property. 


6. The problem of random sampling from a mixed population. Let G(v) be 
a distribution function in the real variable v, and let F(u, v) be for a.e. (relative 
to the measure corresponding to (?) « a distribution function in the real variable 
u, and for every u a Borel measurable function of v. Let 

(37) H(u) = f F(u,v)dG(v ); 

oo 

then by Theorem 4 H(u ) is a distribution function in Ri. Now define for 
X = (#1 j j X n )j y (f/l j * * * j Vn) 

H{x) = H(x t ) • • • H(x n ), 

(38) 

%) = %)••■%)■ 

Both H(x ) and G(y) are then distribution functions m R n . In particular, H(x) 
is the distribution function of a random sample of n independent variates each 
with the distribution function (37) Set 

(39) F(x, y) = F(xi, yi) ■ ■ ■ F(x n , «/„); 

then for a. e. (relative to the measure corresponding to G) y, F(x, y) is a distribu¬ 
tion function in x, and for every x, F(:c, y) is a Borel measurable function of y. 
By Fubini’s theorem we have 

Tl{x) = f Fix^yf) dGiyf) . [ F{x n , y n ) clG{y n ) 

J — ao J— 00 

(40) = f f F(x i ,yd ■■■ F{x n , y n ) dGiyf) ■ ■ • dG{y n ) 

J —00 J—00 


= f F(x, y) dG{ij). 
«—00 


Thus H (t) is itself a mixture in the sense of Theorem 4. It follows from Theorem 
5 that for any Borel measurable function fix), 


(41) 


f f(x) dH(x) = f < I fix) d x F(x, y) > dG(y), 

J— OO V— 00 1 •'—09 ) 


if and only if the left side of (41) exists. When written out m full (41) becomes 

f • • • f fix i , ■■■ cl H < [ Fix i , yf) dG(iji) 

J— 00 J— 00 1 *?—oo 


(42) ‘ ‘{/„ F ^ Xn ’ j = L'" L {jL 

j_ fi x i »•••»*«) P(*i > 2/0 • ■ • d Xm F(x n , y n ) j dGiyf) ■ • • dG{y n ). 
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Equation (41) is of particular interest in connection with the distribution 
of a statistic t = t(x 1 , • ■ ■ , x n ) = t(x). For any distribution function J(x) let 
K(t | J) denote the distribution function of t when x has the distribution function 
J(x). If we set 


(43) 

iip) = | 

1 if i(a:) < t, 

0 otherwise, 

then 

(44) 

K(t | J) 

= f Jix)dJix). 
J—tc 

Hence from (41), 

Kit , Hlxj ■■ 

• HM) =. 

Kit | H) = £ K(t | 

<«> 

*00 


-[ 

... f Kit 

| Fix, ,y x ) Fix„, 

—CO 

J-00 


As an example, let t{x) be Student’s ratio 

(46) 


t = n' ■ x/s, 

let 

(47) 

Fiu, v ) = 

V 2 ; L e dV ' 

and let 

(48) 

(?(«) = < 

' 0 for v < — a, 

§ f or — a < v < a, 



_1 for a < v. 


Then H(u) will be the distribution function of a mixture in equal proportions of 
two normal populations with unit variances and with means — a, a respectively, 
and K(i \ H(x\) • • • H(x n )) will be the distribution function of t in random 
samples of n from this non-normal population. On the other hand, K(t | F(x i, 
yi) ■ ■ ■ F (x n , y n )) will be the distribution function of t in sampling from successive 
normal populations with unit variances and means y,, • - , y n respectively. 
Relation (45) now becomes 

(49) K(t | H( Xl ) ■ ■ • H(x n )) = E K(t | Fix,, y,) ■ ■ ■ F(x n , y n ))/2 n , 

m.- ■•Vn 


where the summation is over all 2" sets (j/i, • ■ • , y n ), each y, being either —a 
or a Due to the complexity of K[t\ F(x, , y,) ■ • • F(x n , y n )) (the frequency- 
function of which is discussed in a forthcoming paper by the author), relation 
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(49) is not very useful. In other cases (45) may afford a considerable simplifica¬ 
tion in the evaluation of the distribution function of a statistic obtained in 
random sampling from a mixed population. 
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SOME APPLICATIONS OF THE MELLIN TRANSFORM IN STATISTICS 


By Benjamin Epstein 


Coal Research Laboratory, Carnegie Institute of Technology 


1. Summary, It is well known that the Fourier transform is a powerful ana¬ 
lytical tool in studying the distribution of sums of independent random variables. 
In this paper it is pointed out that the Mellin transform is a natural analytical 
tool to use m studying the distribution of products and quotients of independent 
random variables. Formulae are given for determining the probability density 


t 

functions of the product and the quotient —, where £ and 17 are independent posi- 

V 

tive random variables with p.di.’s f(x) and g(y), in terms of the Mellin trans¬ 
forms F(s) = I f(x) a ; 8-1 dx and G(s ) = I g(y)y" -1 dy. An extension of the 
Jo Jo 

transform technique to random variables which are not everywhere positive is 
given A number of examples including Student’s f-distribution and Snedecor’s 
F-distribution are worked out by the technique of this paper. 


2. Introduction. It is well known [2], [3] that the Fourier transform is a 
useful analytical tool for studying the distribution of the sums of independent 
random variables. It is our purpose in this paper to study another transform 
which is useful in studying the distribution of the product of independent random 
variables. While it is perfectly true that one can reduce the study of the distribu¬ 
tion of the random variable £ = £1 £2 • > • £«, the product of n independent 
random variables £ 1 , £2 ,■••,£», to the study of the distribution of the random 
variable v = log £ = log £1 + log £2 + • ■ • + log £„ , the sum of n independent 
random variables, it seems worth while to study the distribution problem directly. 
There are advantages inherent in the direct attack on the distribution problem 
which are lost to a considerable degree, if the problem is so transformed that the 
Fourier transform becomes applicable. In this paper we shall 6 how that the 
direct application of the Mellin transform to the study of the distribution of 
products of independent random variables yields results of interest. 


3. Connection between Mellin transforms and products of independent 
random variables. The key reason for the importance of Fourier transforms in 
studying the distribution of sums of independent random variables depends on the 
following result: if £1 and £2 are independent random variables with continuous 1 
probability density functions, (henceforth abbreviated as pd.f and / 2 (a), 

respectively, then the p d.f. f(x) of the random variable £ = £1 + £2 is expressible 1 
as 

(!) /(*) = f /i(z - y)fi(y) dy = [ f 2 (x - y)fi(y) dy. 

J—OQ J — 00 

1 In this paper we shall assume throughout that we are dealing with random variables 
with continuous p d.f.’s The argument can be extended with some changes to distribu¬ 
tion functions which are perfectly general, but for simplicity this will not be done here. 
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But since these expressions are just the Fourier convolutions of fi(x) and / 2 (x), 
it is small wonder that the Fourier transform plays such a basic role in studying 
the distribution properties of sums of independent random variables 
Consider now the following result for products of independent random variables 
(4), (5): if £i is a random variable with continuous p d f fi(x) and & , independent 
of £i, is a positive random variable with continuous p.d f / 2 ( x), then the p d.f 
f(x ) of the random variable £ = £i £ 2 is expressible 2 as 

w «*) -fi *©*<»>*• 

But equation (2) is precisely in the form of a Melhn convolution of/i(x) and/ 2 (x) 
and therefore it may be expected that the Mellin transform should be useful in 
studying the distribution of products of independent random variables 
It is useful to indicate briefly the properties of the Melhn transform A de¬ 
tailed treatment of this transform will be found in [ 6 ] and we shall, therefore, 
stress only those portions of the theory of Mellin transforms which are of im¬ 
portance in the field of statistics. By definition, the Mellin transform F{ s), 
corresponding to a function/(re) defined only 3 for x > 0, is 

(3) F(s) = T f&x’' 1 dx 

Jo 

Under certain restrictions on f(x) [ 6 , p. 47], F(s) considered as a function of the 
complex variable s is a function of exponential type, analytic m a strip parallel 
to the imaginary axis. The width of the strip is governed by the order of 
magnitude of /(x) in the neighborhood of the origin and for large values of x and, 
in particular, the strip of analyticity becomes a half-plane if f(x) decays expo¬ 
nentially as x —> oo There is a reciprocal formula enabling one to go from the 
transform F(,s ) to the function f(x). This transformation is: 

(4) f{x) = f x~"F(s ) ds 

ZiTTZ J c— 


for all x where f(x) is continuous and where the path of integration is any line 
parallel to the imaginary axis and lying within the strip of analyticity of F(s). 


2 More generally [4, p 411], if £1 and f 2 are independent random variables with continuous 
p d f .’s /i(x) and then the p.d f. of the random variable | is expressible as 


( 20 . 






h(y) dy. 


In [4] analogous results are given for random variables with perfectly general distribution 
functions, 

3 The reason for this restuction is that there are technical difficulties m defining a Melhn 
tiangform directly for a function defined over (—=>, c°) In [6], for instance, the Mellin 
transform theory is given for functions defined only for positive values of the argument 
In statistical terminology this means that we are restricting ourselves for the moment to 
positive random variables This is, of course, an unnatural restriction and we shall indi¬ 
cate later m the paper a simple device for treating such questions 
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If, m particular, we are interested m applying Mellin transforms to p.df’s 
of positive 4 random variables, the analysis can be carried out rigorously. Also, 
as m the case of the Fourier transform, one has the desirable property that there 
is a one-one correspondence between p.di.’s and their transforms. 

A number of common p.d.f .’s of positive random variables have simple Mellin 
transforms. For example see Table 1. 

In terms familiar to the mathematical statistician, the Mellin transform of a 
positive random variable £ with continuous p.d.f. f(x) is F(£“ -1 ), where 

(5) F(s) = Eir 1 ) = f aT'fa) dx 

Jo 

The following three basic properties hold: (i) The positive random variable 
17 = a £, a > 0 has the Mellin transform G(s) = a 8- ’ F(s). This is immediate 
since 

(6) G(s) = F<y _1 ) = Wa- 1 r 1 ) = F(s). 

(ii) The positive random vanable tj = £“ has the Mellm transform G{s) = 
F(as — a + 1). To prove this we note that 

(7) G(s) = Eiv- 1 ) = £(£"““) = F(as - a+1). 


In particular if a 


-1, i.e., n 


1 

l 


then 


G(s) = F(-s + 2). 


This is a result which we shall have occasion to use later in the paper. 

(in) If £1 and £2 are independent positive random variables with Mellin transforms 
Fi(s) and F 2 (s), respectively, then the Mellin transform of the product 77 = 
£x£ 2 is G(s) = Fi(s ) F 2 (s). This is immediate since 

(8) g(s) = ecv- 1 ) = ^[ fe ^) 8 - 1 ] = Ear 1 ) Ear 1 ) 


= F 1 (s)F 2 (s). 


More generally if £ 1 , £ 2 , , £» are independent positive random variables with 

Mellm transforms Fi(s), F 2 (s), ■ ■ ■ , F n (s), then the Mellin transform of the 
random variable 77 = £i £ 2 • £„ is G(s) = Fi(s) F 2 {s) ■ ■ ■ F n (s). This relation¬ 
ship is fundamental and justifies the introduction of Mellin transforms in 
studying products of independent random variables. 

From (8) it is clear that we can find the p.d.f. g(y) of the random variable 
7 ) which is the product of two positive independent random variables £1 and £2 
with continuous p.d.f.’s f\{x) and f 2 (z). In fact, by the Mellin inversion formula 


(9) 


g{v) = n / y~‘G{s) ds = — / y~‘ Fi(s)F 2 (s) ds, 

ZiTtl J c—i,eo Z7rt r,oo 


C+i,oo 


A See footnote 3. 



Mellin Transform | Region of Analyticity of Transform 
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where the path of integration is any line parallel to the imaginary axis and lying 
within the strip of analyticity of G(s), As m the case of characteristic functions, 
it can be shown that there is a one-one correspondence between p.d.f .’s and their 
Mellin transforms Therefore, it follows that the p d.f. g{y) computed in this 
way must be precisely equal to 

d°) m =■1 \i, (|) m * - l if. (|) m dx. 

It is easy to verify this directly by showing that the Mellm transform of the 
right-hand side of (10) is -Pi(s) E 2 (s) [ 6 , p. 52], but this will not be done here. 
The essential point is that Equation (9), (which is sometimes easier to evaluate 
than Equation (10)), is a consequence of an algebraic formalism which is 
capable of revealing relationships which would otherwise remain hidden. 

The p.d.f. h(y) of -p = , the ratio of two positive random variables with 

£2 

continuous p.d.f’s, can be reduced to finding the p.d.f. of the product of inde¬ 
pendent random variables £i and — . If Fi(s) and E 2 (s) are the Mellin transform 
corresponding to £i and £ 2 , respectively, then by (ii) A 2 ( —s + 2) is the Mellin 
transform of rand, therefore, the Mellm transform H(s) of ti = 7 is Fi(s) F 2 
(—s + 2 ). Therefore, the p.d.f. h(y) of 17 is 

■1 -1 /iC+t,00 

(11) m = ~ y~’H(s) ds=± y-Fi(8)Fi{-8 + 2 ) ds. 

rZ Je —ijoo Ziirl Jo—i,w 

This formula is useful in finding distributions such as Student’s t and Fisher’s'z. 

4. A modified Mellin transform procedure for finding the distribution of the 
product of independent random variables which are not everywhere positive. 
Up to this point we have limited ourselves to the application of the Mellin 
transform to finding the distribution of the product or ratio of two positive 
independent random variables. While it is true that a number of interesting 
probability density functions are defined only for positive 6 values of the argument, 
it is certainly desirable that we be able to treat situations involving random vari¬ 
ables capable of taking on both positive and negative values. A simple device 
for extending the Mellin transform treatment to the more general problem is to 
decompose the p.d f.’s fi(x) and f 2 (x) of the independent random variables 
£1 and £2 into 

fi(x) = fn(x) + fu(x), 

Mx) = fu(x) + f 21 (x), 

5 For example, distributions of type 3, thex 2 distribution, the distribution of the sample 
standard deviation and sample variance, the distribution of an even power of a random vari¬ 
able, etc are all defined only for positive values of the argument 
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where 6 

fn(%) = 0, x < 0, /i 2 (x) = 0, x > 0, 

Mx) = 0, X < 0, fis(x) = 0, X > 0, 

and then to operate on the pairs lf n (x), f n (x)], [/nO), f n (x)], [fn(x),fa(x)], and 
[/i 2 (x), /aa(ic)] separately. More specifically, the frequency distribution h{y) 
corresponding to the random variable y = £ t £2 is made up of the sum of four 
components h\{y), h 2 (y), hz(y), and ht(y). To compute hi(y) one can apply 
the Mellin transform directly to the evaluation of the expression 

hi(y) = f -fn (-) /21 (x) dx, 

Jo x \x/ 

since both/uOr) and/ 2 i(a;) are zero for negative values of x. The function hi(y) 
is zero for y < 0. To compute h 2 (y) we first evaluate 

h*(y) = f -fn (-') fn(-x) dx. 

Again fn(x) and fn(—x) are zero for negative values of x and, therefore, the con¬ 
ventional Mellin transform can be applied in determining h 2 *(y). It is clear that 
h*(y) - 0 for y < 0 and, therefore, h 2 {y) = h*(-y) = 0 for y > 0. Similarly, 
one can find ht(y) and h(y) where h(y) = 0 for y > 0 and hi{y) = 0 for y < 0, 
and it is readily seen that 7 

h(y) = h i(y) + Jh(y) + h % {y) + h(y) 

is the desired p.d.f. of 17 = £i £ 2 . 


6. Examples of use of Mellin transforms in evaluating the product and 
quotient of independent random variables. Example 1: The distribution of 
ij = £i £ 2 , where £1 and £ 2 are independent random variables with p.d f.’s fi(x) 
andfi{x), respectively, where 

fi{x) = fi{x) = e~ xV2 , - co < * < = 0 . 

In this case 

fi(x) = fn(x) + fu(x), 

and 

h(x) = fn{%) + fw(,x), 


‘ Of course, fn , fii , fn , and fn are generally not p d.f ,’s since 1 fn (x) dx, / fn (x) dx, 

JQ J—00 

f fn(x) dx, f / 22 (a) dx are no longer necessarily equal to one. 

Jo J-K, 

7 As in footnote 6, hi , hi , hi , and hi are, in general, not p.d.f.'s. 
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where 

/11(a) = 0, a < 0; /12(a) = 0, x > 0; 

/21(a) = 0, a < 0;/22(a) = 0, a > 0. 

The random variable ij = £i& has a p.d.f. h(y) — hi(y) + hi(y) + h(y) + hi(y) 
where 

h(y) is associated with [/11(a), /21(a)], 
h(y) is associated with [/u(a), /22(a)], 
hz(y) is associated with [/12(a), /21(a)], 
and hi(y) is associated with [/12(a), /22(a)] 

It is sufficient to evaluate 

h(y) = f - fu (-) /21(a) da. 

“1 ia(|)/uW*. 

In this case 

/.« I oi(»—3) 

Fn(s) = jf a' _1 /u(a) da = j a*" 1 e“ l /2 da = r(a/2), 

analytic for Re(s) > 0 
and 

/>" oi(*- 8 ) 

Fai(s) = I a’ 1 /21(a) dx = —T(s/2). 

V 7T 

Therefore, 

ffi(s) = Fu(«)F»(«) = — r 2 (s/2) 

7T 

^i(l/) = 5-. / ) ds 

aj 7T2' J c~i|Oo 
1 oa—3 

= 9— / 2/~ s — r 2 (s/2) ds, c > 0 

Z7Tt' Jc-i,oo 7T 

= ^ ^o(y), y > 0 [6, p. 197] 

where f£o(j/) is Bessel’s function of the second kind with a purely imaginary argu¬ 
ment of zero order. Similarly 


1 
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Therefore, h{y) ~ h(y) + h 2 (y) + h{y) + /i 4 (y) 


= - Uy\ 


— oo < y < °° } 


and this is the desired p.d.f. This result has been found by other methods and 
is given in [ 1 , p. 1 ]. 

Example 2: The distribution of rj = ~ where £i and & are independent random 
variables with p.d.f .’sfi(x) andf 2 {x), respectively, where 


am - m - Wr 


- CO < y < 00. 


As in Example 1, one splits the determination of h(y), the p d.f. of y, into four 
parts: h(y), ht(y), h 3 (y), hi(y). In the notation of Example 1 it is easy to show 
that tfu(s) the Mellin transform of hfy) is 

2U'~n 2 ^*- 3) 1 1 

En(s).? 2 i( — s -f- 2 ) = ,— r(«/2) 7=^ r(—s /2 + 1) 

V T V 7T 


^ /»C+4,ao 

'‘■ w = s L . 1 

1 /’ c+ ‘ i " 1 j/-« ds 


4 Stt 
sin - 


0 < c < 2 , 


2 m 




c—I.,* 4 . S 7 T 

sm — 
2 


1 


Similarly 


2 tt 1 + i/ 2 ’ 


W = i T 1 


2 tt 1 + 7 / 2 ’ 




2 f 1 + y 2 ’ 


fo(y) = i 1 


2 f 1 + y 2 ’ 

Therefore, h(y) = hj.(y) + h 2 (y) + ht{y) + h(y) 

1 1 


7 T 1 + y 2 ’ 


y > 0. 

t/ < 0 , 

y < o, 
y > 0. 

— co < y < CO, 


This result has been found by other methods and given m [4, p 411]. 

Example 3: F-Distribution. Let &, ■ ■ • , £», i?i, • • ■ , i?« be (m + n) mdepend- 
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ent random variables, each normally distributed with mean zero and standard 
deviation a. Let 


£ = 2 tx, v = 2 w- 

i=l 7=1 

We want to find the p.d.f. h(z) of f where f = £/i?. The p.d.f.’s /(k) and g(y) 
of £ and rj, respectively, are: 


/(*) = 


2 m '%”T(m/2) ’ 


x > 0 , 


„ 2/ n,2_1 e "' W2ff2 

2 nl2 ir n r(n/2) ’ 


y > 0 . 


In this case 


2"V !_2 r 


F(s) = 


r(m/2) 


, analytic for Re (s) > 1 — - , 


0(s) = 


2 M ff r «+ £ - 1 


w r(n/ 2 ) ’ 

The p.d.f. 7 i(2 ) has Mellm transform 

H{ s) = F(s) <7(—s + 2 ) 


lb 

, analytic for Re (s) > 1 — -. 


r U + 


r(m/2)r(m/2) 


-+i+ 1 


Therefore, 

■i ft 0 + 1,00 

h(z) = ^ <fe, - J + 1 < c < J + 1, 

( m + n \ 

_ \ 2 J 2>0 

r(m/2)r(n/2) (2 + l)*< m+n) ’ 

A convenient way of carrying out the inversion is to use formula (d) in Table 1. 
In a similar way one can find Student’s distribution, i.e., the distribution of 

f = £ 0 / 1 ?, wheie I? = ^Y fjn , and where £ 0 , £ 1 , ■ ■ ■ , £ fl are n + 1 independ¬ 
ent random variables each having the distribution: 




— 00 < x < «=. 
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It should be mentioned in conclusion that the Mellin transform is a natural 
tool to use in situations involving the products and quotients of independent 
uniformly distributed random variables, or in finding products and/or quotients 
and/or Beta-distribution. In such cases formulae (b), (c) and (d) in Table 1 
are useful. 
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THE ESTIMATION OF LINEAR TRENDS 


By G. W. Hotjsner and J. F. Brennan 
California Institute of Technology 

1. Summary. This paper deals with the problem of bivariate regression 
where both variates are random variables having a finite number of means dis¬ 
tributed along a straight line. A regression statistic is derived which is inde¬ 
pendent of change in scale so that a prior knowledge of the frequency distribution 
parameters is not required in order to obtain a unique estimate. The statistic 
is shown to be consistent. The efficiency of the estimate is discussed and its 
asymptotic distribution is derived for the case when the random variables 
are normally distributed. A numerical example is presented which compares 
the performance of the statistic of this paper with that of other commonly used 
statistics In the example it is found that the method of estimation proposed 
m this paper is more efficient. 

2. Introduction. A problem that often arises in statistical work is the estima¬ 
tion of linear trends. In the general problem it is known or presumed that a 
linear functional relation exists among a set of variables of the form, 

a + biX + hY + b z Z =0 
The observed values of the variables are of the form 

%ik = Ax “h tiki Vik = Yi T]tk t CtC. 

That is, the x& are random variables with means A, and 7c = 1,2, - A, observed 
values of x are associated with the mean Z,. The ordering of the Xi is according 
to magnitude. Similarly there are the observed values y,i , 2 ,*. and so forth. 
The e t h are random variables, with the same distribution for all i, with zero 
means. On the basis of a sample O n (x t k, y t i , zn , • • • ) it is desired to estimate 
the coefficients a, h,h,b 3 , • ■ • . One method used to estimate the coefficients 
is that of "weighted regression” which is essentially an application of the method 
of least squares. The pioblem has been studied by R. Allen, A. Wald and 
others. 1 The chief difficulty has been that the proposed methods of estimation 
require an a priori knowledge of the variances of the random variables. Wald 
has proposed a statistic which avoids this difficulty but which may have a rela¬ 
tively low efficiency in cases often encountered in practice In this paper there 
is described a bivariate statistic which appears to have comparatively high pre¬ 
cision and which does not require prior knowledge of the variances of the random 
variables. A numerical example is given at the end of the paper to illustrate the 
comparative performances of different methods of estimation. 

1 For abnef history ol work done on this problem see the paper by A Wald in the Annals 
of Math. Stat,, Yol. 11 (1940), p 284. 


380 



LINEAR TRENDS 


381 


3. The Regression statistic. In the case of the bivariate problem, consider a 

sample 

O n (Xii , yik)y r = 1, 2, ■ * , ?i 

and 

& = 1 , 2 , - • , A,, 

where JV, sample values x t , y, aic distributed about mean X t , F,. Let the 
means be related by = a + bZ{ and let the random variables x, be independent 
and have the same frequency distribution with variance a\ for all i and the ran¬ 
dom variables i/ t have independent frequency distributions with variance a\ 
the same for all i . An appropriate statistic for estimating b is obtained by noting 
that a pair of sample points (x,*., y,*), (x,i , y,i) gives a sample value of the 
change m y corresponding to a change in x. It may thus be said that a sample 
value of 5 is 


( 1 ) 




y%k Vji 

Xii Xji 


Making use of the fact that 

(2) Vik = a + bx, h H- ya - be,n 

equation (1) may be written 

(x,k - x,i ) t,k ,H = (x,k - Xji) b + (rj,k - rjji) ~ b(e,k - e 3l ) 
Summing this equation over all combinations of points there is obtained 

EEEE (t/ik — yji) EE2E (0?.* — va) - K«i k - «u) 

w ■ EEZE (-AI - X,,) ZZEE (** - x,d 

» 1 k l i ] h l 

The summations in the above expiession are to be carried out for 


l = I, 2 , • • ■ , Nj ; k = 1 , 2 , • • • , Ni ; j = 1 , 2 , • • , (* - 1 ), i = 1 , 2 , < • ■ , n. 

The first term on the right side of equation (3) is an estimate of b and the second 
term represents the deviation of the estimate from the true value Accordingly, 
we take as an estimate of b the statistic 

ZEEZ iVik - y,i ) 

(4> b ~ £b££ fc. - *„> ■ 

\ 1 h l 

This requires, of course, that the denominator be not equal to zero. Summing 
out the subscripts /c and l reduces (4) to 

E E - &) 

fj = I - 1 _ 

E Z JV,Ar,(i, - Xj) 

1 = 1 Ja=l 
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where y, is the mean value of the y.i and so forth. Summing out the subscript 
J gives 


( 5 ) 


b = 


X (A<sh x A, - A, x A, 


2 /j 


X X N, - Ni x Njx, 

• \ i i j 

This expression may be put in a more convenient form by using the identity 

i (aT. g JV.fc) - | (jf.S, £ AT,) - E (jr.ff. (£«•,-£ ff,)). 

With this substitution equation (5) becomes 


( 6 ) 


ft 

X 

Avft 

fX A, - 2 X N, + n) 

n 

X 

1=1 

NiXi 

(X A, - 2 x N, + A.) 


This is the statistic for estimating the linear trend of bivariate data. It may be 
noted that its derivation is not based on the notion of fitting a line to the sample 
points. A lme y = & + bx may be fitted to the sample points by making it pass 
through the mean of the sample points, that is, by using the following estimate: 

& = y — tx 

where y and x are the means of all the y,k and respectively. 


4 . Consistency of the estimate. Having established the statistics b and d it 
is desirable to examine the consistency and efficiency of the estimates, particu¬ 
larly for h. To determine that h is a consistent estimate we investigate the 
behavior of ( 6 ) as the number of sample points increases, that is, as the A, —» “ ■ 
We wish first to establish the following identity. Consider the sum of the 
following array of terms: 

Mi + A 2 + • ■ • + A„) 

A 2 (Ai + A 2 + ‘ • * + An) 


A»(Ai + A 2 + • ■ ■ + A„) 

n ji 

The sum may be written X Ay X A,. Since the array is skew symmetrical the 

1 1 

it \ 

expression 2 XA t X Ay also gives the sum of the anay except for the fact that 
1 1 

the terms along the principal diagonal are counted twice We have, therefore 

X A. X Ay = 2 X A, X A, — X A” . 

11 11 1 
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Rearranging terms we obtain the identity 


( 7 ) 


£ 




= o 


Now substituting (2) into (G) and making use of (7) there is obtained, 


( 8 ) 


b = b + 


n 

£ 

1 

N* (£ ^ - 2 £ N, + AT t ) 

1-1 

i \u 
rO 

1 

45 

n 

£ 

i 

n <( zn ,- 2 ± iV, + n}j 

|£.~l 



The tU and «» are random variables with zero means so that as Ni —> a> the sample 
means f), and h converge in probability to zero. As AT, —> », x t converges in 
probability to its mean X,. In view of (7) and that the denominator in (8) 
is not equal to zero the last term in (8) converges in probability to zero and b —*■ b. 
The estimate is therefore consistant. A similar argument also shows the estimate 
A to be consistent 


B. Efficiency of the estimate. A general investigation of the efficiency of the 
estimate b is beyond the scope of this paper. We may note, however, that the 
efficiency of the estimate can be made to depend upon the grouping of the data, 
that is, the optimum efficiency of the estimate may depend upon the omission 
of some of the pairs (yu — yn) from the estimate. The maximum efficiency is 
obtained for b when the second term in (3) is minimized, This requires prior 
knowledge of the frequency distribution of the random variables x and y, how¬ 
ever, m applications a recognition of (3) may often indicate a practical method 
of increasing the efficiency. 

In what follows we make an investigation of the precision of the estimate b 
for a special case which is of some practical interest. Let x and y be random 
variables as defined in the first part of the paper and consider the new variables 

a y 

defined by b = - that is, 
u 

u = £ [V. (£ N, - 2 £ N 3 + N^j *,J 

The random variables u and v are then independently distributed with joint 
probability element f(u) f(v) du dv. Making the change of variable u = r cos 0, 
v = r sin 6 the probability element becomes/(r, 6)drd8 where tan 9 = u. Integrat¬ 
ing out the variable r gives the probability element for 8. In what follows we 
investigate the distribution of 9 for the case where x and y are normally dis¬ 
tributed with the same variance. Since u and v are linear functions of x and y 
respectively they are also normally distributed with the same standard deviation. 
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We designate the means of u and v by m i and m 2 respectively and the standard 
deviation by <r The probability element in u and v is then 


(9) 


— exp {- - [(m - mi) 2 + (v -m 2 ) 2 


2t nr 2 


2 ff 2 


dit dr 


Changing variables to r, 9 and setting in 1 = f cos 0, m 3 = ?' sin d we obtain the 
following probability element: 

1 exp < — [(r cos 8 — r cos 0) 2 + (r sin 0 — f sin 0)] l dr (18 

2u 2 I 


2iriJ a 




Completing the square in r and substituting 4> = 9 — 0 there is obtained 

(10) db exp (r _ f 003 ^ )2 } exp _4 ( rJ ^)} Ar d4> 

To integrate out r make further change of variable 

. r f 

t --COS <j> 

a (T 


f • • • • , 

Setting - cos <t> = w for convenience in notation there is obtained 

(X 



The variable t is to be integrated out of this expression. The corresponding 
limits of integration are exhibited by 



Now as the number of points in the original sample increases the value of f 

<T 7T 

also increases and as ~ —> 0, with \<j>\ < the value of w —> °o, In this case 
then (12) approaches asymptotically to 

As cr/r —> 0 this distribution shows that <#> converges in probability to zero and 
that the distribution approaches asymptotically to the normal form 



It is required then to examine the conditions under which <r/f assumes small 
values. If the variance of the original variables x, and y, is designated by <si 
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then since u and v are linear functions of Xi and y t respectively the variance of u 
and of h is 


(14) 


2 2 Vh 

O' = O' lL 


NAZn, -2£,N, + N t 


Now ?' is the sum of the squares of the means of u and v so that 


(15) 


= (1 + b 2 ) £ 


nAZn 3 -2+ x,)x 


Dividing (14) by (15) we obtain 
(16) 


m- . _sL £{[*(?*■- a $ y ’ + 4f(r.) 

\rj 1 + b 2 r /" _ <+ „ , \ Tl a 


N t (iN 3 -2 t, Ni + x. 


Inspection of (16) indicates that as the number of sample points Ni increases the 


value of J decreases rapidly. To illustrate this we examine some particular 

cases Consider first the case of four equally spaced means Xi — 3rVi, 
(i = 1, 2, 3, 4) and let there be one sample point for each mean (IV, — 1). 
With these values there is obtained, 

0.022 
= r+T 2 ' 


For b = 1 the range — 9° < <j> < +9° includes 95% of the population defined by 
(13). As the number of points N t is increased or as the number of means X, 


is increased the value of decreases rapidly. Consider now eight equally 

spaced means X, = , (i = 1,2, • * • ,8) with again one sample point for each 

mean (X, = 1). With these values there is obtained 


/<A 2 _ 0.00045 
\f) 1 + b 2 ' 


For b = 1 the range — 1° < <#> < +1° includes 95% of the population defined 
by (13). _ _ 

It is clear that a very high degree of precision is obtamed with the estimate b 
when there is a considerable number of sample points. However, this will also 
be true in general of other statistics and it is really of interest to compare pre¬ 
cisions in those cases where the statistics have a relatively low precision A 
detailed comparison is beyond the scope of this paper. However, a direct com¬ 
parison can be made very easily in the particular case when x, is a fixed variate 
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and only y t is a random variable. For the sake of simplicity, let each N x = 1 
then the statistic for estimating h is 

w 

i(y> - y) 2 yXi - i 

__ _X_ 

n 

2 »(** — x) 2 *»(i — i) 

1 1 

Since b is a linear function of the y x by a well known theorem its variance is 



The customary least squares regression line of y on x gives for the estimate of b 
and its variance 


n 



In the particular case when the Xi are equally spaced, x x = ci + d, the estimates 
b and h R are identical: 

(19) 1 - ? ’’<■* - 

6 . Numerical example. From a practical point of view the case where x and 
y are random variables is of greater interest than where x is a fixed variate. We 
give a numerical example of this case comparing the statistic b with several other 
statistics. Consider the case where there is one sample point for each mean X ,. 
We shall evaluate the following: 

1) The statistic of this paper which for this case is 





n 


2 V& - ?) 



2 *,(»' - i) 

l 


2 ). The statistic obtained by minimizing the sum of the squares of the y 
deviations only 
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3). The statistic obtained by minimizing the sum of the squares of the orthog¬ 
onal deviations 


£ (Vi - y) 2 ~ £ 0* - S) : 
1 

+ 


& 3 = 


n £ (y t - yf - n £ - xf + 4(£ (tj t ~ y)(x, ~ x) 


n 


£ {y - y)b - $) 


TABLE I 


Set 

Xl 

l/i 


yi 

Z 3 

!/s 

34 

Vt 

1 

am 

RS 


2 0 

3 0 

2.7 

3.6 

4.3 

2 


■SB 


2.0 

3 4 

3.1 

3 8 

4.2 

3 

1.0 

1.4 

1.6 

2.1 

2.8 

3 2 

4.4 

4.3 

4 

0.6 

0 7 

l.S 

2,0 

3 3 

2 6 

3.8 

4 0 

6 

0.7 

1 4 

1.7 

mm 

2 7 

3.4 

4 1 

4.1 

G 

1.0 

1.2 

1 6 

■11 

2.9 

2.6 

3 6 

4 0 

7 

1 3 

0 7 

1 7 

2 1 

2.7 

2.9 

4.0 

3.6 


TABLE II 


Set 

6 l 


b. 

b t 

1 

1.160 

1.068 

m 

1 162 

2 

1 066 

1.009 


1.027 

3 

0 860 

0.843 


0.870 

4 

0 946 

0,896 

0 924 

0.830 

5 

0.875 

0 867 

mmm 

1.000 

6 

0 978 

0.939 

HUS 

0.846 

7 

1 044 

0,969 

Ba 

1,000 

Mean , 

0 990 

0.940 

0 996 

0 962 

7 X Sample Var 

0 0686 

0 0373 

0 1058 

0.0834 


4). The statistic proposed by Wald 2 

nf 2 n 

£ Vt ~~ £ yi 

t 1 n[2 

= Wi -«- ■ 

1 n/2 

We apply these statistics to sample data having four means X t — i and Y v = 
i, (i = 1,2,3,4). By means of a table of random numbers seven sets of data were 


J Loc. cit. 
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obtained, each set having one sample point corresponding to each mean. These 
sample points are described by Table I where it will be noted that the sample 
points were drawn from a discrete distribution. The estimates obtained from 
the four statistics are exhibited in Table II. 

If the 28 sample points are treated as a single set of data and the four statistics 
in their appropriate forms are applied, there is obtained the following set of esti¬ 
mates : 

bi 62 bs hi 
0.9768 (L9183 0.978S 0.9496“ 

The preceding computations show that the estimate hi is inferior to the other 
estimates, as would be expected. The estimate I 3 is most accurate when the 28 
sample points are treated as a single set of data with the estimate t>i being only 
very slightly less accurate, b 1 = 0.9768 as compared to t> 3 = 0.9786. When the 
individual sets of sample points 1 to 7 are considered it is seen that the estimate 
b 1 is most accurate with the estimate 63 rather less accurate, the estimate bi is 
more precise than b s , the sample variances being in the ratio 0.0686 -¥■ 0.1058 = 
0.65. From a practical viewpoint we may also point out that the computation 
of bi requires very much less labor than the computation of . 



ON THE EFFECT OF DECIMAL CORRECTIONS ON ERRORS OF 

OBSERVATION 

By Philip Hartman and Aurel Wintner 
The Johns Hopkins University 

1. Summary. Let t be the true value of what is being measured and suppose 
that the error of observation is a symmetric normal distribution of standard 
deviation <r The “roundmg-off” error due to the reading of measuiements to 
the nearest unit has a distribution and an expected value depending on t and <r 
It is shown that, for a fixed <7 > 0, the expected value of the decimal correction, 
r(£; a), is an analytic function of i which is odd, of period 1, positive for 0 < t < 
and has a convex arch as its graph on 0 g £ g 5 Furthermore, if 0 < L < §, 
both r(£; cr) and its maximum value, Max r(t, a), arc decreasing functions of a. 

t 

2. Introduction. Let X be an error of observation and let <t>(x) denote the 
density of probability of the distribution of X. In particular, 

+00 

(1) / 4>(x) dx = 1, where 1 j>(x) ^ 0 

«/— 00 

If t is any fixed number, the density of probability of the distribution of 
X T- t is <h(x — t). 

Besides the “instrumental error of observation”, X, there is another error, that 
of the “rounding-off”, which is carried along in the registration of the measure¬ 
ments. It is introduced by the circumstance that, if ■ , b, a are digits, and if 

b denotes the last digit considered, then decimal fi actions such as ■ ■ ba and 
■ • ba • • ■ are registered as • 5 if a < 5 and as • (b + 1) if a > 5. Let 
the unit, in which the measurements are expressed, be so chosen that the first 
digit neglected becomes the first digit following the decimal point, i.e , that the 
error of the “rounding-off” is between Then, if t denotes the true value of 
what is being measured, the remark made after (1) shows that the probability that 
the error of the decimal corrections be less than x is given by 

tc p n— 

X / <j}{u — t) du, 

n ca—00 7t —i 

if | x | ^ whereas this probability is 0 or 1 according as x < — | or x > 3. 
Since the last series can be written in the form 

(2) X I 4>(u + n - t) du = i X 4>(u + n - t) du, (<t ^ 0), 

7j — e© J—} *■'— i n«—co 

it follows that the density of probability of the error due to the decimal correc¬ 
tions is 

(3) X + n - t) if | x | < i and 0 if | x | > i 
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Consequently, if r = r(t) denotes the expected value of the decimal error induced 
on the “true" value, t, of the observations , then 

A oO 

(4) r(t) = / x 2 0(.n + ?i — t) dx. 

Formula ( 4 ) is known. 1 It is usually based on its intuitive interpretation which 
results if, on the one hand, ( 4 ) is written in the form 

(5) r(t) = f s(x)</>(x — t) dx, 

J— DO 

where 

(6) s(x) - x if — \ < x < \ and s(x) = s{x + 1), — «> < x < w, 

and, on the other hand, the periodic function (6) is thought of as representing the 
uniform distribution of the error of “rounding-off” over the arithmetical continuum 
over a period, 

|#-ft|<i, (« = o, ± l, ■ • 

on the x-axis. Needless to say, the specification of s(x) at the points x = ft + 
which are disregarded in the definition (6), is immaterial, since s(x) occurs in 

(5) only as an mtegrable weight-factor, isolated values of which do not influence 
the integral. 

It follows at once from ( 1 ), ( 5 ) and the continuity (almost everywhere) of 

(6) , that r(t) is continuous. 

3 . Fourier analysis of r(t). Since the Fourier expansion of the periodic func¬ 
tion (6) is 

cc 

( 7 ) s(x) = 7r _1 2 (~l)”ft _1 sm 2 t rnx = s(x ± 1 ) = • ■ • , (| x \ < 3), 

n=l 

it follows from ( 5 ) that 2 

(8) r(t) = —tt ~ 1 X (— l) rt ft _1 f <t>(x) sin 2 rn(x + t) dx. 

71=1 «/—00 

Hence, if the sine m (8) is expressed in terms of 2 mx and 2 irftf, 

oo 

( 9 ) 7r r(t) = — 2 (—l)"ft -1 (a„ cos 2 irnt + b n sin %mt), 

n=l 

'F, Zermke, “Walirscheinlichkeitarechnung und mathematisohe Statistik, "Handhueh 
der Physik , Vol 3 (1928), pp 475-478. 

2 In view of (1), the term-by-term integration leading from (6) to (8) is justified by the 
fact that the partial sums of the series (7) are uniformly bounded Correspondingly, the 
above deduction of (9) and (10) from (4) is equivalent to an application of Poisson’s summa¬ 
tion formula, In this regard, cf, A Wmtnei, “The sum formulae of Euler-Maclaurin and 
the inversions of Fourier and Mobius,” Am. Jour of Math , Vol, 69 (1947), pp. 685-708, 
the end of §1 (p 687) and its application on p. 697. 
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where 

/* 00 

( 10 ) b n + «« = / <j>(z) exp (2nnx) dx, (n = 1,2, ■). 

J — oo 

Let it be assumed that positive and negative errors of observation, when of the 
same magnitude, are equally probable, i e., that <j>{x) = Then (10) 

shows that a n becomes 0. Hence, (9) reduces to 

to 

( 11 ) r(t) = — X) (~l)”(c n /w) sm 2xnt, 

n=1 

where 

( 12 ) 1 c n = ir -1 / 0 (m) cos 2 rnx dx = 27r _l i . 

V— CO Jq 

Clearly, r(f) is an odd function whenever the density 4>{x) is even. 

4. The normal case. Suppose that <j>(x) is the density of a symmetric normal 
(Gaussian) distribution. Then, if a is the positive constant representing the 
standard deviation of the errors of observation, 

(13) <t>(x ) = (2iro- 2 ) -i exp(— fa?/a*) (0 < u < co). 

It is clear from (5) and ( 6 ) that 

(14) r(t) —> s{t ) if o- —> 0 m (13). 

Actually, all that (14) says is a triviality, according to which the total error 
becomes the decimal error when the measurements become infinitely sharp. 
In this limiting case, that is, if r(f) = s(l), it is seen from ( 6 ) that the giaph of the 
periodic function r = r(t) is piecewise linear, and therefore discontinuous 
If <r = 0 is replaced by 0 < o- < <*>, the jumps of r(f) at t = n — \ disappear 
(cf the end of §3) and, as will be proved below, 

(I) r(t) is an analytic function which is odd, of period 1 , and positive for 0 < t < f 
(hence negative for — | < t < 0), and 

(II) the graph of r = r(t) over the fundamental interval 0 g t ^ ^ is a convex 
arch, no matter what the value of a m (13) may be. 

Since r now depends both on the “true” value, t, of the observations and the 
“precision”, a, of the measurements,let r be denoted by r(t, a). Itwillbe shown 
that 

(i) Max r(f; <r), where the Max refers to t while o is fixed, is a decreasing function 
of c, where tr varies on the half-line 0 < <r < <*> , and that, on the same half-line, 
b (ii) r(f; cr) is a decreasing function of cr at every fixed t contained m the funda¬ 
mental region 0 < t < |. 

All of this seems to be clear for physical reasons. Actually, it is easy to give 
examples of distribution laws, distinct from (13) for which the above assertions 
become false. 
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5. The 5 -function. As is well-known, 

f exp (— ^/a~) cos ux dx = (27rcr 2 ) 4 exp ( — ^aii') 

J_ce 

Hence, the value of the integral ( 12 ) is q n \ if q is an abbreviation for 

(15) q = exp(— 2 irV) 

Consequently, if r(t, q) is defined, in terms of the above ?•(£; a), by placing 

(16) r(t, q) = r(i, a) in virtue of (15), 
then ( 11 ) shows that 3 

(17) r(£, q) = — 7 r _l (—l) B Ji~ 1 g n2 sin 27rnf 

71ml 

It will be noted that the range, 0 < a < , of the standard deviation is mapped 

by (15) on the range 

(18) 0 < q < 1 , 

and that <r decreases or increases according as q increases or decreases. 

Let partial differentiations with respect to t and q be denoted by primes and 
subscripts, respectively. 

(19) f = df/dt, J t = df/dq. 

Thus, from (17), 

( 20 ) r'(t, q ) = -2 E (-l)V* cos 2imt 

71 ml 

and, as easily verified from (17), 

(21) r„(t, q) = (- 47rg)"V"(i, q). 

Let d(t, q) be defined by 

( 22 ) 6{t, q) = 1 + 2 £ q nl cos nt 

n-1 

(so that 8(t, q) is, in the main, the elliptic theta-function usually denoted by 
$a). It is known that 

(23) 0(t, q) > 0 
and that 4 

(24) 0'(£, q) < 0 if 0 < t < tt (hence, 0'(£> q) > 0 if — 7 r < t < 0). 

The above assertions will be deduced from the'se facts. 

5 Cf. F Zernike, loc. cit 

* For a simple pioof, cf. A. Wintrier, “On the shape of the angular case of Cauchy’s dis¬ 
tribution,’’ Annals of Math. Stat., Vol. 18 (1948), pp, 589-593, §6 
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6. Proof of (I)-(II) and (i)-(ii). First, it is seen from (17) and (22) that 

(25) r'((, q) = 1 - 6(2w( - tt, q). 

Hence, 

(26) r"{i, q) = - 2x0'(2 T t - t, q). 

If (20) is compared with (24), it is seen that 

(27) ?"((, a) < 0 if 0 < t < \ (hence, r"{t, q) > 0 if - $ < f < 0). 
Consequently, (I) and (II) follow, since, in view of (17), 

(28) r(±i g) = 0 = r(0, g). 

Next, (21) and (27) imply that 

(29) r a (f, g) > 0 for 0 < f < 5. 

Hence, (n) follows from the fact that g is a decreasing function of a. 

As to (1), let i = ((g) denote that (unique) (-value on 0 < { < \ at which 
r(f, q) assumes its maximum value, say r\ so that 

(30) r q = r(t(q), g), (0 < t(q) < J). 

Clearly, f = ((g) is the only f-value on 0 < t < \ for which 

(31) r'(f, q) = 0 

Since r'(f, q) possesses continuous partial derivatives with respect to t and g, 
and stnce (27) implies that its partial derivative with respect to t, namely, r"[t, q), 
does not vanish at £ = ((g), it follows that the solution f = ((g) of the equation 

(31) possesses a continuous derivative Hence, the function (30) possesses a 
continuous derivative with respect to q, namely, 

(32) ^ = r'(f(g), q ) — + r 8 (i(g), g) 

But since t = ((g) is a solution of (31), the identity (32) can be reduced to 

= >•,(!(?), }), (0 < i(s) < i). 

Consequently, (1) follows from (29), since g is a decreasing function of a. 



WEIGHING DESIGNS AND BALANCED INCOMPLETE BLOCKS 

By K. S, Banerjee 
Pusa, Bihar, India 

1. Introduction. Following a paper by Hotelling [1] on the weighing prob¬ 
lem, Kishen [4] and Mood [2] furnished generalized solutions. This note consists 
of some additional remarks on the weighing problem when the weighing is re¬ 
stricted to be made on one pan. 

Hotelling remarked that when the problem was to determine a particular 
difference or any other linear function of the iveighls, a different design should 
be sought to minimize the variance. An account of efficient designs of this kind 
has also been furnished m this note. The notations used by Hotelling and 
Mood have been used here 

2. Chemical balance problem. It has been shown by Mood that when 
Ns 0 (mod 4), an optimum design exists if a Hadamard matrix PI N exists, and 
is obtained by using any p columns of H N . When N s % (mod 4), (i = 1,2, 3), 
very efficient designs are obtained either by adding to or deleting from the rows 
of HiK , making the resultant number of rows equal to N. 

It has further been shown by Mood in connection with this class of designs 
that arrangements 1 are available which are more efficient than the one obtained 
by repeating the row of ones. As a matter of fact, if any row other than the row 
of ones be repeated, this will lead to a design of the same efficiency as in the case 
of repeated addition of the row of ones, for, the determinant of X'X will remain 
exactly identical That this is so, will be clear from the following properties 
showing the connection of the matrix X with the determinant | |: 

(i) Any two rows of the matrix X can be interchanged without changing the 
determinant | a,j |. 

(li) Any two columns of the matrix X can be interchanged without changing 
the determinant | (hj | 

(m) The signs of all the elements in a column of the matrix X may be changed 
without changing the determinant 1 a tJ | 

3. Spring balance problem. Mood has exhaustively discussed the designs 
when N > p Efficient designs under this class wall, however, be available from 
the arrangements afforded by balanced incomplete block designs discussed in 
[3]. These designs will be represented by certain of the efficient submatrices of 
the Pk of Mood. 

Usually v and b are used to denote respectively the number of varieties and the 
number of blocks in the above mentioned designs. Here v will take the place of 

1 This had been independently shown by me before the paper of A. M. Mood was brought 
to my notice by H Hotelling 
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p, the number of objects to be weighed and b that of TV, the number of weighings 
that can be made The matrix X'X in this case will take the form 


( 1 ) 


> XX X' 
X r X • X 
X X r ■ X 


LX XX ■ rJ 


The variance of the estimated weight of each of the p objects for such a design 
can be easily seen to be 


( 2 ) 


r + X(p - 2) 2 

(r - X){r + X(p - 1)] * 


for zero bias, 


where p is the number of objects to be weighed and r and X have meanings similar 
to those m connection with balanced incomplete block designs; that is, r is the 
number of times each object is weighed, and X is the number of times each pair 
of objects is weighed together 

Though the minimum mimmorum of a/N can never be attained by the objects 
to be weighed under such designs, a 2 /N may however be kept as the standard 
with which the efficiency of a given design may be calculated. The efficiency 
of the above design will therefore for zero bias be 

,„ N O’ - X){r + X(p - 1)) 

W N{r + \(p~ 2)} ‘ 

The identities well known in the theory of balanced incomplete blocks, 


bk = vr, \{v — 1 ) = r(fc — 1 ), 

may, upon replacing b by TV and v by p to accord with the notation of weighing 
designs, be written 

r = Nk/p, X - t(Jc — 1 )/(p — 1) 

Upon substituting these in (3) we obtain the efficiency factor in the form 

¥(p — It) 

(4) p(pk - 2/c + 1) ’ 

where 7c is the number of plots per block or the number of objects that can be 
weighed at a time. 

If instead of adopting repetitions of P K , only weighings be made in all, 

the efficiency factor calculated for such a combinatorial design would be 

Q — X)}r + X(a> - 1)} 
b {r + X(d — 2)} 


for zero bias 
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where 



and b = . The above expression on simplification reduces to (4), 


It will be noticed that the efficiency of such designs depends only upon the 
total number of objects to be weighed and the number of such objects that can 
be weighed at a time. 

These designs have the advantage that all the weights are estimated with 
equal precision. If a slightly larger number of weighing than what is affoided 
by the number of blocks in a balanced incomplete block design has to be made, 
all the objects may be weighed together and this weighing be repeated as many 
times as required. This will be equivalent, to the repeated addition of the row 
of ones. The repetition of the row of ones in particular is necessary to make the 
weights estimable with equal precision, which however, may be demanded at 
times as a matter of necessity in certain experiments. Otherwise, any other 
single row or different rows of the matrix X may be repeated, making the number 
of rows of the matrix X equal to the number of weighings proposed to lie made 
in all. 

From the practical point of view also, it will be advantageous to connect the 
designs for weighing with the already existing balanced incomplete block de¬ 
signs, which have been highly developed in recent years and arc being extensively 
used in agro-biological investigations. 

4. Spring balance design for small p. Under this class of designs, Mood has 
found the most efficient design for p = 7. It is given by 


In = 


'I 0 1 0 1 0 r 
0110011 
0001111 
1100110 
0 11110 0 
1011010 
-110100 1 . 


This Li is easily recognized to be the design for k = 4, b — 7, v = 7, r = 4, 
> = 2, given by an orthogonal series [3). It is therefore seen that Hadamard 
matrices will lead to a new method of constructing balanced incomplete block 
designs of a certain class. For example Hu and H m will lead respectively to the 
designs for k = 8, b = 15, v = 15, r = 8, X = 4 (or for 7c = 7, b = 15, v ~ 15, 
r — 7, X = 3) and for k = 10, b = 19, v = 19, r = 10, X = 5 (or k = 9, b = 19, 
v = 19, r = 9, X = 4). These designs also satisfy the condition of maximum 
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efficiency, by virtue of the fact that | L N j will have the value 


as shown by Mood. 


(N + 1) w,+i) /2 f i 


6. Determination of a linear function of the objects. An orthogonalized 

design which is cent percent efficient to determine individually the weight of p 
unknown objects is not necessarily the design of maximum efficiency for the es¬ 
timation of a linear function of the objects. To illustrate this, let there be three 
objects, the weights Oi, Oj, O 3 , of which have to be estimated on a balance 
corrected for zero bias and let us, for this purpose, concentrate on the design 
characterized by the matrix given below. 


(5) 



As has been indicated in the previous papers, the variance of each of the unknown 
objects comes out to be %<r 2 , which is the minimum mmimorum and as such the 
above design enjoys the cent percent efficiency, when the question of individual 
estimation is concerned But in estimating a linear function of the objects, 
for instance the total weight, designs more efficient than this are available 
The variance of Wi + l*Oi + I3O3 is known to be 


( 6 ) t UAi* 

where C,, denotes the elements of the matrix reciprocal to the matrix X'X 
As the above design furnishes the estimates of the unknown objects orthogonally, 
the variance of the estimated total weight of the three objects will be given by 
fo - 2 If, however, the design given by the matrix 


(7) 


11 r 
110 
101 
.01 1 _ 


be adopted, the variance of the estimate of the total weight may be easily seen 
to be (3/7)<r s , by putting k = k = h = 1. (3/7)cr 2 is evidently less than fa- 2 . 
Therefore with four weighings, the design characterized by (7) is more efficient 
in estimating the total weight than that characterized by (5) A still more effi¬ 
cient design for getting the total weight is simply to weigh all the objects to¬ 
gether four times 


6. Designs with arrangements afforded by balanced incomplete blocks. The 

necessity for an efficient design to estimate any linear function of the objects 
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(or to be precise, say to estimate the total weight) will perhaps arise only when 
the objects cannot all be weighed at a time collectively on a single pan. Here 
also, an efficient design under the supposition that all the objects cannot be 
weighed together is afforded by the arrangements in balanced incomplete blocks. 
In such a design, the diagonal elements in the matrix reciprocal to X'X will be 
all positive and equal to 


( 8 ) 


r + A(p - 2) 

(r - A){r + A(p - 1))’ 


while the remaining elements in the reciprocal matrix will be negative and equal 
to 


(9) 


_ = X_ 

(r - A){r -f- \(p - 1)]' 


Using the generalized form of (6) and admitting of the possibility that any of the 
arbitrary constants Z, may be negative, the variance of the linear function 
ZXi Wi may be easily seen to be 


( 10 ) 


'jjZi_ A(SZ,) a \ 2 

r-X (?•—A){r+(p—1)X}J °" 


If, however, in the above expression, the coefficients k are equal to 1, (10) is the 
variance of the estimated total weight, and reduces to 


(ID 


r -f (p - 1)X ° ' 


When there are N weighings in all, the minimum variance that can be reached 
is c t/N and will be attained, it appears, only when all the objects are weighed 
together and the "weighing is repeated N times The efficiency of a given design 
may therefore be calculated with reference to a/N. Remembering that the 
number of weighings takes the place of the number of blocks and p the place of v, 
the efficiency of the design will reduce to (/c/p) 2 , where k is the number of plots per 
block i.e the numbei of objects that can be weighed at a time 
If, however, the combinatorial arrangement is adopted weighing all possible 


combinations of k objects and making weighings in all, the same efficiency 

as above will be obtained for such a design. 

Given k, the above expression of efficiency will therefore be the deciding factor 
for choice between an arrangement of balanced incomplete block design and all 
possible combinations of k objects. 


7. Design of maximum efficiency. Designs leading to the matrix X'X of 
the type (1) have certain advantages inasmuch as the variances of the individual 
objects are equal, as are also the covariances between all possible pairs. The 
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variance of the estimated total weight in such a design is given by (11). To 
minimize the variance thus obtained, the expression 

(12) r + (p - 1)X 

has to be the maximum for a given value of p. In an arrangement of the bal¬ 
anced incomplete block type or in an arrangement with all possible combinations 
of It objects being weighed at a time, (12) would reduce to rh and would therefore 
increase with the increasing value of rh This shows that the estimation of the 
total weight will have increased precision if more of the objects are weighed at a 
time. 

If all the objects could be weighed at a time and both the pans be used for the 
purpose, some of the elements m the matrix X will be —1 instead of 0. This 
would increase the value of r but would decrease the value of X To devise the 
best possible design therefore, account will have to be taken simultaneously of 
r and X 
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BOUNDS FOR SOME FUNCTIONS USED IN SEQUENTIALLY TESTING 
THE MEAN OF A POISSON DISTRIBUTION 1 

By Leon H. Herbage 

Brooklyn College 

/(*> Xi) 


1. Introduction. Let z = log 


.j— y\ » where fi x > *0 = ( e x< AT)A', 
fix, Xo) 

U = o, 1), is the elementary probability law of a Poisson variate X, under the 
hypothesis that the mean is equal to X,. Without loss of generality we shall 
assume Xi > Xa. 

Let Ha be the hypothesis that the distribution of X is given by fix, Xo) • Wald 
[1, pp. 286-287] has devised general upper and lower bounds for the probability 
of accepting Ha , when X is the true value of the parameter, and the sequential 
probability ratio test is used. This probability is called the operating-charac¬ 
teristic function and is designated by L(X). Using these results he has com¬ 
puted the bounds for the binomial and normal distributions [2, pp. 137-142], 
We shall do the same thing for the Poisson distribution, since the restrictions 
[1, p 284, conditions I to III] under which these general limits are valid can 
rather easily be shown to apply to the Poisson distribution, if we make the fur¬ 
ther restriction that E(z) ^ 0, 

These general results are 


1 -w 

SiP‘ - B h 


< 1 - L(X) < 


tjB 


A h - t,B>‘ 


and 


( 1 ) 


1 - A 
oB h - A* 


< L(X) < 


1 - v A h 


if h > 0 , 


if h < 0 , 


B h - yA h ’ 

where a, /3 are probabilities of committing errors of the first and second kind re¬ 
spectively and 

A = (1 - (3)/«, B = /3/(l - a) 


( 2 ) 


hz 


y = gib [Eye 
& = lub pEle” 


e “ < f 


Jit 


> t 


P/> 


and h is the non-zero root of the expression, Ee tl = 1. 
unknowns are y and 6 


f > i; 

0 < p < 1 ; 

Hence the only remaining 


1 The author is indebted to Professor A Wald for suggesting the problem which led to 
this note and for helpful discussions. 
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The following bounds to En , the expected number of observations required 
by the sequential probability ratio test defined by a, |3 have been derived [ 1 , pp. 
143-147]: 

L(A)(log B + £') + [1 — L(A)] log A < „ 

- Wz - > En 

< L{\) log B + [1 - L(\)](log A + |) 

> Es 

the upper or lower inequality signs holding according as Ez > 0 or Ez < 0, where 

(3) £' = Mm E{z + r | z + r < 0 ), 

r 

and 

(4) £ = Max E(z — r | z — r > 0), (r > 0 ) . 

r 

Using the limits to L(A), we then find £ and £', which determine En. 

2. Special terminology. By an almost-increasing function we shall mean one 
that has the following properties: If x is any point of discontinuity, then (a) x + h 
is also where h is any integer and x + l is a point of continuity if l is not integral, 
(b) f{x - e) < Hx - e 1 ) < fix) for 0 < t' < t < J, (c) f\x - 1 ) < fix), (d) 
lim t _o/(x + e) = f{x +) < fix), ie) fix — 1 +) < fix +). It is clear that the 
minimum value for/(y) in any closed interval [a, b] is equal to mm [/(a), /(a' +)] 
where o' is defined as a if the closed interval contains no discontinuity, and as 
the leftmost point of discontinuity otherwise As special cases, if a is a point of 
discontinuity this minimum is /(o +) and ifa;<a<&<a!-t-l the minimum 
is/(a). 

Almost-decreasing functions are defined similarly except that the inequalities 
go the other way. In this case the maximum m the interval is max[/(a), /(o' +)] 
and we have special cases as above 

3. The case h > 0 . Since e = a x e~°, where a = Xi/Ao and c = (Ai — \o) the 
condition e hz < 1 /f maybe expressed as a hx e~ ch < 1 /f, whence 

(5) x < c/log a — log f/(h log a) - s — r (say), 

Since x > 0, r < s. Hence 0 < r < s. Also 

( 6 ) Ee h = 2 (e -0 a *)' 1 6 —= exp (-c/i — X + m' 1 ), 

*_o s! 

and 

(7) {E{e* | e !h < 1 /f) = ) h | x < s - r] 
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From (5), £ = a rh and (7) becomes 


(7.1) 


[«-rl —X 

A —.-ch xh 

—j- e a 

rh x«0 X 1 

1 t-l e ~x x . * 


^0 X\ 


where [s - r] is the largest integer < (s — r). Our problem is to minimize (7) 
with respect to £. Since r is a strictly increasing function of f, this is equivalent 
to minimizing a Th C/D = 6 (say) with respect to r, where 


c=2 — , 

Xo.0 1 


and 


B- E A 

X=0 # ! 


It will be shown that (7 1 ) is an almost-increasing function of r and therefore 
the minimum occurs at either r = 0 or r = v +, where v = s — [s], since the 
saltuses occur at r = v + k for k = 0 ,1, 2 , • • • , [s]. 

Since a 1 * is an increasing function of r and C/D remains constant as long as 
[s — r] remains constant, condition (b) is fulfilled. 

Conditions (c) to (e) refer to the saltuses only, hence, to show them, we may 
assume, without loss of generality that r and s are integral. We proceed by in¬ 
duction, using the notation d(w) to mean the value of 0 , when r = w, to show (c). 

First we prove the following: 

Lemma A 0(a) > 0(s - 1). 

Proof: Since we assumed \i > X a and h > 0, a > 1. Hence (1 + \)a h > 
1 + Xa\ whence, a fortiori, a sh > a ( ‘~ 1)h ( 1 -f- Xa‘)/(1 + A). 

To show that if 0 (r + 1) > 0(r), then 0(r) > 0(r — 1), we shall show that 

( 8 ) CD + Dba' n+m < CDa!‘ + Cb 

implies 


(9) CD + Dbqa {n+1)h < CDa h + Cbqa h , 

where n = s - r,b = \ n /n\, q = A/(n + 1 ). 


Since, as we shall see below, 

(10) Dba in+1 % - 1 ) < Cb(qa h - 1), 


or 


(11) Da (n+m (q - 1 ) < C(qa h - 1), 

addition of ( 8 ) and ( 10 ) yields the desired result, ( 9 ). 

It now remains to prove (11) or that 



(12) 
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Setting (6) equal to 1 we get Xa = ch + X, which when substituted in (12) 
yields 


(ch X) E 


(ch + X) n+1 (X - n - 1) Z < h n+1 (ch + X - n - 1) Z ; . 

Ml! z=>0 X\ 


Upon letting p = ch + \, we have 


1 - (b + 1 ) vl 1 p — (n + 1) y f p/ \ 
< p n+1 ±lx\ {P> ' 


X n+1 


33=0 X 1 


say 


Then our problem reduces to showing that F(y) is increasing in 0 < X < y < p 
or that the derivative with respect to y, F‘ ( y ) is positive. 


r(y) = - Z 


(n - x)(f- n - 1 ) 


+ (« + 1 ) Z (n ~ * ){f 


l ) 


-o (x + 1)! 


4- (n + 1 ) V 


2, ~n~2 


>(n + 1 fy n 2 , since («. + !)> (a: + 1); 


> 0 since y > 0. 


Thus condition (c) is demonstrated. To show (d) we must show that 
8(r +) < 8(r), which means that 


C - ba nh 
D - b 


< .jr 

a 


c 

D ' 


But this is true if C < 
to showing that 


Da nh which is easily verfied. Condition (e) is equivalent 


(r-l>* C * C - ba 

0 D <a TUUT 


which is proved just as (c) was. 
Hence, 


t) = mm < e 


(13) 


til e - X n hl 


33-0 


S' 


hx / la] “ 

/ XcaO X 1 


I s —1] I [»-U ^ 

-,h e X a / V' e X 


: / j- 1 ] 

_vh -ch y e a u / y 

a:! / *=o 


a e 


x==0 


a:! 


As special cases we have (i) if s is integral, t? is the latter with v = 0 and (ii) 
if s < 1 (b) is the only applicable condition and we have an ordinary increasing 
function, hence y is the former. 

Similarly, it may be shown that 

(14) S = max [«-*J5(flf *\x > {*}), cf’V^K' 1 \x > {s + 1})], 

where {s} is the smallest integer > s and a = {s} — s. Here there is only one 
special case, namely (l). If A < 0, J is the larger of the two expressions on the 
right side of (13) and y is the smaller of the two corresponding expressions in 

(14). 
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4. Since z = — c + x log a, £ may be written 

Max log aE(x — t \ x > t), 

t 

where t => (r + c)/(log a). Hence s = c/ log a < t < «>. Therefore if we can 
show that E(x-t\x> t) = 7 (f) (say), is an almost-decreasing function of l 
we will know that £ occurs either when t = s or {s} + since, as will be seen, the 
jumps occur at integral t 

To show (c) we make use of the following which is easily proven: 

Lemma B Let X , Y, Z each be greater than zero. Then a necessary and suffi- 
X X Y 

cient condition that < -—■—- is that XZ < F 2 . 

y } z 

Therefore, to show for integral t that 

(15) 7 (0 < y(t - i), 

or that 


L (» - 0 — £ (* - 0 , 

x-t x< ^ *_i a : 1 *_( a;! 

W S « \ X n 1 > 

X'X 1 A 

£ixl £i®! (f - 1)1 


we need only show that, for all integral i, 

(i6) x ' -1 V & ~ *> x * < 

K } (t - l)l£l s! < L=**U ’ 

Since both sides of (16) are power series in X where the exponents start with 21 
we need only show that the coefficient of every term on the left is less than the 
corresponding term on the right. 

In the case of the coefficient of X z,+2< , (j > 0) we have to show that 

_ 2 j + 1 / 2 2 

(t + 2 j + l)l(f - 1 )! (f + 257 ) 1*1 (1+ 2 j - 1 ) 1 (< + 1 )' 


+ ■• + 


(f + jW + jV 


or by multiplying both sides by (2< + 2j) ' that 


( 2 j + 1 ) 


2 < + 2 j 


2 1 -f 2j 


2t -f 2j 

, t + 1 


+ ■ • + 2 


2t + 2 j ' 

t+J- 1 


(21 + 2j\ 

+ = M, say 

\t+j J 

Replacing all the binomial coefficients on the right by the smallest one we have 
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since 


< (^j for n > 2s 


Thus the truth of (16) has been established 


for even exponents The odd terms aie treated similarly, 

Hence, we have shown that 7(0 is a strictly decreasing function of f, if l takes 
on integral values only. We shall now show (b),, i e, that 


(17) 


Jj> a , r 

Yj{x ~ t)— Yj (X~ t + t) — 

f A 9m t _ 35=»( t— e) X' 

^ co ( x _ — y(f f), 


X' 


CO \X 

I T 

-i t-«i z 1 


The denominators are equal and each term of the numerator on the right is 
greater than the corresponding term on the left, hence (17) is valid, 

Conditions (a) and (d) can be shown, by showing in a similar manner, that 


(18) y(t +) = 1 + y{t + 1) 


and t(0 > 1 + y(t + 1) for integral t. By using (18) for £ and £ — 1 together 
with (15) we show y(t - 1 +) < y(t +), which is condition (e), Thus we have 
shown that 


\ = max 


00 1 / » \* „-A 

, , v* &a e / v' A. c 
-c + log a L / L — r , 
*-(«) X\ / *_(,) x\ 


log a 


- M + Tj 


■v X “X 

xh e 


M — X"1 


I 


X e 


#=;( 8-J-l) X^ / ajs=(«-)-!} x! J 


As in Section 3, £' is the lower analogue of £, i.e. 


= min {- c + E(x\x < [s]), - [s| log a + E(x | x < [s - 1])), 


and the special cases are as in that section, 
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NOTES 

This section is devoted to brief research and expository articles and other short items. 


THE DISTRIBUTION OF STUDENT’S t WHEN THE POPULATION MEANS 

ARE UNEQUAL 

By Herbert Robbins 

Department of Mathematical Statistics, Umverstiy of North Carolina 

Let Xi, ■ • ■ , x N be independent normal variates with the same variance a 
and with means pi, • • , hh respectively. Set n = iV — 1 and let 

(1) x = 2 %i/N, s i = Yj (ax — t = IV 1 x i/s. 

i i 

If all the jui are 0 then t has Student’s distribution with n degrees of freedom; its 
frequency function will be denoted here by 

(2) fn.oit) = n~ i 1 • (1 + t 2 /n)~ Hn+1) . 

When dealing with situations involving mixtures of populations or m which the 
mean exhibits a secular trend, it is important to know the distribution of t 
when the ft, are arbitrary; in the general case let 

. . M = 2 h\/N , P 2 = £ G»> - fif/N, 

(3) i i 

a == Ni?/2c\ X = iV/3 2 / 2a 2 . 

The distribution of t will be shown to depend on the three parameters n, a, X 
If X = p 1 = 0, so that all the m are equal, then the distribution of t determines 
the power function of the ordinary t test. We shall here consider the case in 
which a = js = 0, although the n, are different. Denoting the frequency function 
of t in this case by f n ,\(t) we shall show that 

(4) U,\(t) = f nfi {t) ■ exp j- n/ 2, - X(1 + t 2 /n)~ l ), 

where F denotes the confluent hypergeometric series, and where, since m = 0 , 

(5) X = X 

i 

In fact, the general distribution of t, of which (4) represents the case a = 0, 
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may be derived as follows. Using the standard orthogonal transformation 
[1, p. 387] let 

tr n 

( 6 ) Ci, J ) Cj!#, 


1 = 1 


7=1 


where 

(7) 

then 

( 8 ) 


@1} 




(* = 1 , ■ • , AO, 
O' = 1 , • ’ •, AO; 


f = n ! Zi / (X! z^V 


The joint frequency function of the z, is easily seen to be 


(9) 

where 

( 10 ) 


(2*r*' s . ff -*.exp — ZOt- a 4 ) 2 /2v 2 , 


ffli 


= iV^jS, 'LcZ = N(3\ 


Thus t is the ratio of a non-central normal variate to the square root of an in¬ 
dependent non-central chi-square variate. It is known [2, p. 138] that the 

A’’ 

frequency function of q = X) 2 »/Q 


(ID 

where 

( 12 ) 






SjTGw + j ) 1 


X = 2 a(/2v 2 = N!?/2a. 


The frequency function of y = zi/v is 


" VE“ P (- 


(<re - aQ 2 ~ \ 1 

2<r 2 


_ -a -(*2/2) V' (2 g ) t/2 i 

V 2tt i=o ^ 1 


V? 23+B_1 


that of q is, by ( 11 ), 

*»- (s>0) ’ 
hence that of u = v/q = n H is 

[ h(q)g(uq)qdq, 

Jo 

which, after integration, reduces to 

V 1 V X 3 ( 2 a*it/ r(AT/2 + j + fc/ 2 ) ^ ^y-ur+aj+D/i 

tlo/ 2T 6 2-J Z-l x 1 7,. I r>rV./O _1_ ol ' ' ' 


j=0 1-0 


i'fc' r(n/2 + j) 
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In particular, if a = p = 0 then (13) reduces by means of the relation 
F{a,y,x) = e x F(y — a, y, — x~) to 


(14) 


R , 1 n 

B| 2'2 


-Xu2/(1+„S) 


(1 + ttf 


,2n-!w 


F ( -l - 
' 2 ’ 2 ’ 


X Cl H - u s ) 


from which it follows that the frequency function of t is given by (4). 

Again, let %\, • • • , x Nl+Nl be independent normal variates with the same vari¬ 
ance i t 2 and with means jui, ■ • , /Ofi+vj respectively. Set ni = Ni — 1 , 
M 2 = Ni — 1 , n = ?ii + , and let 


Xi 


Vi 


= T,xjNi, 


Xi = 2 XJN 2 


N i+l 


Ni±N 2 

(15) S 2 1 = 22 - XiY/ni, S 2 = 22 (.X. - Xi)*/ni 

1 V,+l 

i = M + ihsl)/(ni + ni), i = [NiNt/{N\ + Ni)}\xi — Xi)/s. 


If all the p, are equal then t again has Student’s distribution with n degrees of 
freedom. In the general case let 


if 1 Vl-|-W 2 

Pi = HfiJNi, M! = 22 Hi/Ni, 


(16) 


Jfl+1 


i — 22 (m> — Pi) 2 /Ai> 


Nl+tft 

= E (m, - 

ifi+i 


Then we may show as before [1, p, 388] that in this case u — n H has the fre¬ 
quency function (13), where now 


(17) 


N = Ni + Ni - 1, X = {NiPl + Nitty 2a, 
a = [M/(Aj + Ni)}{fh - jh?/a. 


In particular, when a = pi — p 2 = 0, so that pi = p 2 = P, say, the frequency 
function/ n ,x(t) of f is again given by (4), where now 


(18) 


X = 



1 


Extensions in this direction to the general linear hypothesis m the analysis of 
variance will not be treated here 
If we set 


(19) 


w = (l + F/n) 1 


where t has the frequency function (4), then w will have the frequency function 


(20) <7„,x(ui) 



—X(l— w) 

6 


W 


in—1 


(1 - 1 c) _i • F 


( 1 n 
\ 2’2 
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for 0 < w < 1. Thus for every t, 

ft A1 +l 1 /n)-l 

(21) 1 — I /n,xGc) da; = I dw. 

i Jo 

It would be interesting to have numerical values of the integral on the left side 
of ( 21 ) for that value of t for which 

(22) 1 — / fn,a(,x) dx = 001 or 0.05 (say), 

but existing tables (e.g, those in [2] and [3]) of the integral of (20) were compiled 
for a different purpose and do not supply this information. The following re¬ 
marks throw some light on this subject 
Let us set 

m = /n,x(fl//,..(0 = exp • F (-g> - x d + w 1 ) 

(23) = (1 — X(f 2 /n)/(l + tf/n) + o(X)| 

• (1 + X/(n + t 2 ) + o(X)) 
= 1 + X(n + t 2 ) *(i — f 2 ) + o(X). 

Then as X —> 0 we have ultimately 


(24) 


R(t) > 1 if j 1 1 < 1, 
R{t) < 1 if 1 t 1 > 1 


Hence for any t > 1 and for sufficiently small X, 


(25) 


1 



/„,x(i) dx < 1 - 



dx. 


The exact range of values of t for which R(t) < 1 depends of course on n and 
X. However we shall show that always 

(26) R{t) < 1 if | f | > 1, 


so that (25) holds for all n and X > 0, provided f > 1 The proof is as follows. 
In terms of w we have 


(27) R(t) = e F(—$, n/2, — Xw) = e n((n + l)/2, n/2, Xu>). 
Now 

F((n + l)/ 2 , n/ 2 , \w) = 1 

V' (n + l)(n + 3)' • (n + 2 /fc — 1 ) 


(28) 
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and by induction on fc we may show that for all fc = 1 , 2 , • • • , 

(29) + 3)•■•(»+ 2 fe - 1 ) 

where the equality holds only for fc = 1 . Hence 

(30) F((n + l)/2, n/2, ~Kw) < 1 + X (1 + k/n)- ( ~Kw) k /k 1 = e Xm (l + \w/n), 

(31) R(t) < • (1 + \w/n) < e~ ui ~ w) - e Wn = e -Mi-u,(i+i; n) ]. 

Hence R(t) < 1 if w < w/(n + 1 ), which is equivalent to (26). 


REFERENCES 

[1] II. CEAsvrfin, Mathematical Methods of Statistics, Princeton University Press, Princeton, 

1940 

[2] P. C. Tang, “The power function of the analysis of variance tests with tables and illus¬ 

trations of their use,” Stat Res Memoirs , Vol. 2 (193S), pp. 127-149, 

[3] Emma Leiimer, “Inverse tables of probabilities of errors of the second kmd,”Amials 

of Math Slat , Vol 16 (1944), pp 388-398. 


A DISTRIBUTION-FREE CONFIDENCE INTERVAL FOR THE MEAN 

By Louis Guttman 
Cornell University 

1. Summary. Consider a random sample of N observations xi , x 2 , • ■ • ,x N , 
from a universe of mean p and variance a 1 Let m and s 2 be the sample mean 
and variance respectively: 

(1) m= - s 2 = -*. X (z. - mf 

iv 1=1 iV 1=1 

It is shown that the following conservative confidence interval holds for ju: 

(2) Prob {(m - p) 2 g s 2 /(iV - 1) + \aV2fN{N -”l)} > 1 - A -2 , 

where X is any p ositive c onstant Inequality (2) also holds if, in the braces, X 
is replaced by \/X 2 - 1, with X ^ 1. 

Inequality ( 2 ) is much more efficient on the average than Tchebychef’s in¬ 
equality for the mean, namely, 

(3) Prob {(m - iif £ \V/N\ > 1 - X -2 , 

yet (2) and (3) are both distribution-free, requiring only knowledge about <r\ 
At the 1 — X 2 = .99 level of confidence, the expected value of the right member 
in the braces of (2) is only about 1/6 the corresponding member of (3); at the 
.999 level of confidence the ratio is about 1/20. 
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A more general inequality than (2) is developed, also involving only the single 
parameter <r. 

2. Derivation. Consider the function 

(4) u = (to - nf - «7(A - 1) - ca, 

where c is an arbitrary constant. It is easily verified that Eu = - cc, and that 

(5) Ru = <r*[2/A(A - 1) + c], 

A basic feature of (5) is that the only population parameter in the right member 
is o 2 . Contrary to what might have been surmised, the fourth moment of x 
about is not involved, and indeed need not exist. 

According to Tchebychef’s inequality, 

(6) Prob {— WEv? ^ \-\ZEu 1 j > 1 — X -2 , 

where X is an arbitrary positive number. Using (4) and (5), it is possible to write 

(6) as: 

Prob {s 2 /(A - 1) + Ccr 2 - X<rV2/A(A - 1) + c 2 ^ (m - 

(7) _ 

g a 2 /(A - 1) + <r 2 [c + XV2/A(A - 1) + c 2 ]} > 1 - X- 2 . 

In the braces of (7), if the left member is negative, there is no harm in replacing 
it by zero, if it is positive, then replacing it by zero may only increase the prob¬ 
ability of the braces. Regardless of the value of this left member, it is true that 

Prob {(to — uf ^ s 2 /(iV — 1) 

+ <r 2 [c + XV 2/A (A - 1) + c 2 ]} > 1 - X -1 . 

If we set c = 0, we have inequality (2). Some improvement over (2) is obtained 
by determining c to minimize the right member in the braces of (8), yielding as 
the shortest confidence interval: 

(9) Prob { (to - g) 2 S s7(A - 1) + <r 2 V2(X 2 - 1)/A(A - 1) } > 1 - X' 2 . 
Inequality (9) differs from (2) only by replacing X m the braces by Vx 2 - 1. 

3. Comparison with Tchebychef’s inequality. The expected value of the 
right member of the braces m (2) is 

(10) <r 2 [l/A + XV2/A(A - I)]. 

The ratio of (10) to the corresponding value of Tchebychef’s inequality (3), 
namely xV/A, is 

(11) [1 + XV2A/(A - 1)1/X 2 

Since (11) decreases as X increases, the efficiency of inequality (2) increases com¬ 
pared with that of Tchebychef as the level of confidence 1 — X -2 increases The 
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squared interval of ( 2 ) involves only the first power of while that of ( 3 ) in¬ 
volves the second power 

4, Approach to normality. If the fourth moment of the universe’s distribu¬ 
tion exists, then it is well known that the ratio of E(m — p ) 1 to <r 4 /lV 2 must ap¬ 
proach 3—the ratio for the normal distribution—as N increases. That is, if 
a + 1 is the ratio, then lmur_<o a = 2 It is known 1 that Tchebychef’s inequal¬ 
ity can be replaced by one involving both a and a , and that 

(12) Prob {(m - p) 2 g <r 2 (l + \a)/N\ > 1 - \~ 2 . 

If a = 2, then the right member in the braces of (12) becomes a 2 (1 + \\/2)/N 
This is virtually the same as (10), the expected value from ( 2 ). In a sense, then, 
( 2 ) implicitly takes account of the fact that the distribution of sample means 
approaches that of the normal distribution with respect to the fourth moment. 
A striking feature, however, is that ( 2 ) holds for any N > 1 and does not even 
presume the fourth moment of the universe to exist, whereas to set a = -\J2 in 
(12) in general requires a large N and finite universe fourth moment 

6 . Further possibilities. Confidence interval (2) is derived from but one 
of a series of general intervals, each of which depends only on a . It may be pos¬ 
sible to derive from this series even more efficient intervals, according to the 
method now to be outlined 

One way of arriving at (2) is to consider all products of the form ( x, — p) 

(x, - p), where i > j and i, j = 1, 2, • ■ ■ , N. Let p 2 be the mean of these 

N{N — l )/2 products It can easily be seen that p 2 = u in (4) with c = 0, 
so that jh is a second degree polynomial in m — p, the coefficients being sample 
statistics A more general quadratic would be u 2 = p 2 + Cipi + Co, where ci 
and Co are arbitrary constants and pi is the mean of the N values (x, — p) or 
pi = m — p, It is easily seen that Ep x = Ep 2 = Epip 2 = 0, and that the only 
universe parameter involved in Epl and Epl is a. Hence the only universe pa¬ 
rameter upon which u\ depends is also <r 2 
Higher degree polynomials in m — p can be defined, possessing the same 
properties as ih Let be the mean of the N(N - 1 )(N — 2)/3I products of 
the form (x, — p)(x, — p)(xx — p), where i > j > k and i, j, 7c = 1, 2, • • • , N\ 

etc., and let p N = (xi — p)(x 2 — p) • ■ • (x# — p). Set p 0 = 1, and let 

71 

(13) u n = 2 c on p a (n = 1 , 2 , ■ • , N), 

o-0 

where the c M are arbitrary constants. It is easily seen that Ep a = 0 (a > 0), 
Epapb = 0 (a b), and that each Epl depends on only the parameter a as far 

1 See, for example, Louis Guttman, “An inequality for kurtosis," Annals of Math 
Slat., Vol. 19 (1948), pp, 277-278, 
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ns the universe is concerned. Hence Eu n depends only on c~ Furthermore, by 
writing x,-n&s (a:, - m) + (m - pi), it is seen that p a is a polynomial of degree 
a in to - n, the coefficients being sample statistics. From (13), then, u n is a 
polynomial of degree n m m - M with statistics as coefficients 
According to Tchebychef’s inequality, 

(14) Prob {u n g 1 ! |«S) > 1 - X~ J 

The interval for u\ m the braces can be expressed m two statements: 

(15) f n (m — n) = u n — \VEul A 0, 

(16) g n (m — pi) = u n + WEu\ 0 

Both Jn and g n are polyno mials of degree n m m — pi, g n exceeding /„ always by 
the additive constant 2xV Eu„ Let q n and Q n be the smallest and largest real 
zeros respectively of /„ , and let r„ and R n be the smallest and largest real zeros 
respectively of g n 

For convenience, we can suppose that c nn —the coefficient of (m — pi)" in u n — 
is positive. If n is even, then/* is positive for m — p > Q„ and for m ~ p < q n . 
Hence the interval g, ^ m - p d contains all the points included m (15) 
and possibly more. Since the probability of (15) is not less than the probability 
of (14), we can write the following confidence interval • 

(17) Prob [q n g m — pi g Q„j > 1 — X~ 2 (n even). 

The problem remains to determine the c a „ so as to minimize the expected value 
of Q n — q n Inequality (9) provides the minimum for the case n = 2 This 
can be verified by adding the term cipi to u m (4) and finding that the minimum 
requires Ci = 0. 

If n is odd, we again may set c nn > 0. Then /„ > 0 for m — pi > Q n , and 
#„< 0 for m — pi < r n . The interval r tl g m - pi S Q n thus contains at least 
all the points found jointly in (15) and (16) and hence forms a conservative con¬ 
fidence interval: 

(18) Prob {r n ^ m — pi ^ Q„} > 1 — X 2 (n odd) 

Again, the problem is to determine the c an so as to minimize the expected value 
of Qn - r n Tchebychef’s inequality (3) does this for the case n = 1 
Although the only population parameter involved throughout is a, the sample 
moments up to the nth order are present m (15) and (16) It thus seems plau¬ 
sible that improvement over inequality (9) should be possible for n > 2 To 
obtain such an improvement requires developing a distribution-free theory of 
the zeros of /„ and g n beyond the quadratic case 
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ON THE COMPOUND AND GENERALIZED POISSON DISTRIBUTIONS 

By E. Cansado Maceda 
University of Madrid 

1. Summary. In this note wo deduce several properties of the compound 
and generalized Poisson distributions, in particular their closure and divisibility 
properties. An infinite class of functions whose members are both compound 
and generalized Poisson distributions is exhibited, and several of the distributions 
of Neyman, Polya, etc. are identified The present note stems from a paper by 
Feller [2] 

2. The compound Poisson distribution. If F(x j a) is a family of distribu¬ 
tion functions depending on the parameter a, and U(a ) is a distribution function 
such that it assigns zero probability to any a domain for which F(.r|a) is unde¬ 
fined, then 

G(x) = [ F(x | a) dU(a) 

is a distribution function In particular if F(x\a) is the Poisson distribution 
with mean a, and 77(0) = 0, G(x) is called the compound Poisson distribution 
associated with the distribution function 17(a), cf Feller [2], Clearly G(x) is a 
step function over the non-negative integers, the saltus at the point x = n being 

= f" e ~“— dU(d), n = 0,1,2, ■■ 

Jo nl 

It is convenient to introduce the factorial moment generating function 
(f m.g f.) for G(x) as follows 

«(«) = E((l + a)-) = £ t.( 1 + *) n 

71=0 

= [ e +a, dU{a ) 

Jo 

= 

where 4(g) is the ordinary moment generating function (m g f.) for U(a) This 
gives a convenient relationship between the moments of U(a ) and its associated 
compound Poisson distribution. 

On account of the multiplicative properties of w(z) and <#>(z) under the convolu¬ 
tion of G(x) and 77(a) respectively, it is seen that the compound Poisson dis¬ 
tributions form a closed family, and if Gi(x) and Gn(x) are two compound Poisson 
distributions associated with Ui(a) and 77 2 (a) respectively then Gi(x)*Go(x) is 
associated with 7/i(a) + 7/2(a). In addition, if 77(a) is infinitely divisible (cf. 
Cramer [1]) then G'(.t) is also, since it can be factored into the convolution of 
arbitrarily many compound Poisson distributions. 
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Choosing m particular U(a) as the Pearson type III distribution, the asso¬ 
ciated function is the Polya-Eggenberger distribution, and if U(a ) is a Poisson 
distribution the associated function is the Neyman contagious distribution of 
Type A. 

3. The generalized Poisson distribution. If F(x | a), defined for non-nega¬ 
tive integers a = 0, 1, 2, • • - , is the a- fold convolution of a given distribution 
F(v) with itself, i e. F(x\a) = F(x)* a , and U(a) is the Poisson distribution with 
parameter a, then the distribution function 

G(x) = f F(x | a) dU(a ) 

JQ 

is called the generalized Poisson distribution associated with F(x). 

If 11 ( 2 ) is the f.m.g.f of U(a) then for the f m g f of G(x) we have 

10(2) = £ 

n =0 n' 

_ e *(0(2)-l) 

CO 

It follows that w(a) can be written as XI <>>,( 2 ) where u,(z) is a generalized Pois- 

v-=l 

son distribution, and thus 01 ( 2 )'belongs to the infinitely divisible family. More¬ 
over, if Gi(x) and Gi(x) are two generalized Pgisson distributions associated with 
Ui{a) and lh(a) with parameters ai and a 2 respectively, then G{x) = Gi(x)#Gs(x) 
has for f.m.g.f 

ui(2)c»*(«) = exp|(«i + on) l ^ 

and G(x) is again a generalized Poisson distribution function associated with 
the distribution 

, , at Ui(a) + a 2 Ui(a) 

U(a) = -u—-‘ 

ai t «2 

and with the parameter m + a 2 Thus the generalized Poisson distributions 
form a closed family. The analytic nature of the generalized Poisson distribu¬ 
tions have been studied by Hartman and Wintner [3]. As noted by Feller [2] 
the various Neyman contagious distributions are generalized Poisson distribu¬ 
tions. 

4. Further remarks. From the above observations it is clear that a necessary 
and sufficient condition for a distribution to be a compound Poisson distnbution 
is that its f m g f. be of the form 


(1) 


011(2) = <K *0 
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where 4>0) is the ordinary m.g.f. of a non-negative random variable. Likewise a 
necessary and sufficient, condition for m(z) to lie the f m.g.f of a generalized 
Poisson distribution is that it bo of the form 

(2) uz(z) = e“ (!lw a > 0, 

where f2(z) is the f.m g f. of an arbitrary distribution function F(r). If we 
choose 4>(z) = C " <4 "- 1> and 0(z) = e“, then toi(z) = to 2 (z), and the distribution 
whose f m.g f. is Ul (z) (the Neyman contagious distribution of Type A) is simul¬ 
taneously a compound and a generalized Poisson distribution (cf Feller [2]) 
We now show that there is an infinite class of distributions with this property, 
First note that if <t>(z ) is the m.g.f of an arbitrary distribution, then exp 
(a(^(z) - 1)} is also the m g f of a d.f., and in fact is the m.g f. of the generalized 
Poisson distribution associated with the distribution whose m.g.f. is cp(z) Now 
jet <#>(z) be the m.g.f. of an arbitrary non-negative random variable, and define 

(3) co(z) = exp {«((#) (z) - 1)} a > 0, 

Then u(z) is simultaneously of the forms (1) and (2), since <£(z) is, by (1), also 
the f.m.g.f. of a distribution function, i.e. the compound Poisson distribution 
associated with the distribution whose m.g f is <Hz). However, not every dis¬ 
tribution which is both a compound and a generalized Poisson distribution can 
be generated in this manner For example, the Polya-Eggenberger distribution 
is easily shown to be both a generalized and a compound Poisson distribution, 
yet its f m.g.f. 

«(*) = (J - dz)~ hli , d>Q,h>0, 

h 

manifestly is not of the form (3), since this would imply ) = 1 - —^ log 

(1 — die) is a characteristic function. But | <£(w) I is unbounded as z —» ± » and 
thus is not the characteristic function of a distribution. 
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ON CONFIDENCE LIMITS FOR QUANTILES 

By Gottfried E. Noether 
Columbia University 

In finding confidence limits for quantiles it is usual to determine two order 
statistics Z, and Z, which with a given probability contain the unknown quantile 
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between them, The values of % and j corresponding to a given confidence coeffi¬ 
cient can be determined with the help of the distribution laws of order statistics 
aS is shown, e g., in Wilks [1] The purpose of this note is to determine i and j 
with the help of a confidence band for the unknown cumulative distribution 
function. 

In what follows we shall always denote the cumulative distribution function 
(cdf) by F{x), i.e., F(x) = P{X < a:}. Then the quantile q p is determined by 

(1) F(q r - 0) < p < Fib) 

which reduces to 

(T) H<b) = P 

if F(x) is continuous Given a sample of size n we can construct the sample 
cdf F n (x) defined by F n (x) = l/n (number of observations < x) Confidence 
coefficients will always be denoted by 1 — a 
Assume that we can construct two step functions L(x) and U(x) parallel to 
F n {%) such that for any fixed value x 

(2) P{L(x) < F(x) < U(*)} = 1 - a. 

We do not require that the confidence band determined by L(x) and U(x) cover 
the graph of the unknown cdf F(x) with probability 1 — a, but only that for any 
arbitrarily chosen value x (2) is true 
Lot 

L(x) = ri k , U(x) -- dk 

for z k < x < z k + 1 , fc = 0, 1, • , n where z*, is the value taken by the order 

statistic Z k and z 0 = z n+1 = + =o Then if F(x) is continuous it follows 

from (2) that a confidence interval with confidence coefficient 1 — a for q v is 
given by 

(3) Z. < < Z, 

where i and j are determined by 

(4) 9.-1 < P, 9. > P 

(5) Vj-i < P, V 3 >P 

It will be noted that (3) represents a half-open interval. However as long as 
we only admit continuous cdf’s the confidence coefficient is not changed if we use 

(3') Zi <C q P Z 3 

or 

(3") Z l <q P < Z, 

instead. This is no longer true if we also admit discontinuous cdf s. Then the 
confidence coefficient connected with (3') is < 1 ~ a, while that connected with 
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(3") is >1 — a, as follows immediately from consideration of the possible out¬ 
comes when (1) is tiue. This is the same result as that obtained by Scheff6 
and Tulcey [2] 

We shall now indicate how i)k and 8 k can bo obtained and find their values in a 
particular case. For any arbitrary value x we can consider F n (x) as the sample 
estimate of the unknown parameter p = F(x) of a binomial distribution Clop- 
per and Pearson [3] have discussed how confidence intervals for the unknown 
parameter of a binomial variate can be found. Thus w r o can deter min e in 
and 0k correspondingly, but as is well known (2) cannot be achieved with prob¬ 
ability exactly equal to 1 — a. We shall have to be satisfied with probability 
>l — o!. Consequently the same will hold true for the confidence coefficient 
connected with the confidence interval for q v . 

In many cases central confidence intervals seem to be more desirable, at least 
intuitively, than others. Our method produces such central confidence intervals 
for the unknown quantile if wc use central confidence intervals in the construc¬ 
tion of the confidence band, In that case i\k and 8 k are determined by 


(6) 

f n-h + 1) 

(7) 

2 — — k, k + 1) 


except that ijo = 0, 0„ = 1 by definition, where 

Up, q) = f r 4 ( 1 - i) ,_1 dt/ ['* f _1 (l - f)*" 1 dt 

Jo Jo 

is the incomplete beta function. Schefffi [4] has pointed out how the tables of 
percentage points of the incomplete beta function by C. M. Thompson, etc. 
[5] can be used to find rj fc and 6 k . 

We shall show now that in the case of the median M the solution based on 
(3)-(7) leads to the same confidence interval as that suggested originally by W. 
R. Thompson [6]. Thompson found that for 7c < n + § 

(8) P{Z k < M < Z^h+i} = 1 - 27j(n - h + 1, h) 

provided the unknown distribution had a continuous cdf. (8) can be used to 
maximize k under the condition that the nghthand side is > 1 — a. 

We shall first show that our method leads to the same land of a confidence 
interval, i.e., one with i — l } j = n — Z + l. This follows immediately from the 
fact that by (6) and (7) 

(9) 1 - <h = Vn-l- 
For let 

(10) di -1 < | and 6i > |, 
then by (9) rj n _i < \ and ij n _ !+ i > J. 
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It remains to be shown that k as determined by (8) equals Z. This will be so 
if we can show that 

(11) I\i n — l + 1, l) < j^-< Ii(n — l, l + 1) 

Remembering that I x (p, q ) is a monotomcally increasing function of x we get 
with the help of (7) and (10) 

2 = Ii-oi-ii 71 — Z + 1, Z) > /*(« — Z + 1, T) 

and 

| = h- Bl (n -1,1 + 1) < Ii(n -1,1 + 1) 
which proves (11). 

In conclusion it may be worth while pointing out that the formula 
P{Z t < q p < Zj} = f p (i, n — i + 1) — I v (f, n — j + 1) 

given, e.g, in Wilks [1] for the continuous case can be obtained by a slight modi¬ 
fication of (6). 
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A LOWER BOUND FOR THE EXPECTED TRAVEL AMONG m RANDOM 

POINTS 

By Eli S. Marks 
Bureau of the Census 

In connection with cost determinations in sampling problems, it is frequently 
necessary to determine the amount of travel among m random sample points in 
an area A lower bound for the expected value of this distance is found to be: 

m — 1 
Vffl ’ 
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where A is the measure of the area from which the m random points are 
drawn. 1 

If in a finite area S we locate m points at random (see Figure 1), wc can trace 
a continuous path among the m points by starting at some point and connecting 
the points by line segments. The points can be connected in any order so that 
the path touches each point only once (unless it intersects itself at one of the 
random points). We are interested in a lower bound for the expected value of 
the length of the shortest of the m! possible paths. 



Fia. 1. m Random Points in S. 

We have above an area S in which m random points have been selected (with 
m - 14), 

The shortest path among the m points consists of m — 1 “links” (line segments) 
between two points. Each link can be assigned to one of its end points, leaving 
some pre-designated point (e.g., the m-th point selected) with no link assigned. 
The link assigned to the i -th random point (it,)) must be no less than r ;i) the 
distance from X(,j to the nearest of the other (m — 1) points If we denote the 
length of the shortest path by L : 


L > Z 


r<o, 


m—1 

E(L) > Z E(r („) 

1=1 

Let Exfrr,)) be the expected value of conditional upon X(,> falling at the 
point x m S and let F(r \ x) be the conditional distribution function of f(o for X(,j = 
x. Thus F(r \ x) is the conditional probability of r (l) < r or the probability of 


1 The lower bound obtained 19 similar in form to the expression for distance traveled 
among a set of random points used by Mahalanobis [2 ] and Jessen [1 ] 
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one or more of the (m — 1) random points other than falling inside a circle, 
C t , with radius r and center at x (see Fig. 1). Then, we have: 


E x (r w ) = J uIF(r\x), 


where M(S) and M(SC,) are the measures of S and SC r , so that - ^ r - is the 

probability of a random point in S falling into C, . 

Let A = M (£) and construct a circle C with center at x and radius p = 

Then M(C) = A = M(S). Let d be the distance from x to the nearest of 
(m — 1) points selected at random from C and let G(r) be the distribution func¬ 
tion of d. Then we have. 



m 

G(r) 



fM(C) - M(CC r )\ m_I 
I M(C) I 


For r < p, 


M(CrC) = M{C r ) > M(SCr). 


For r > p, 

MiCrC) = M(C) = M(S) > M(SC r ), 
Thus, since M(C r C ) > M(SC r ), we have for all x m S: 

G(r ) > F(r | x), 


and thus, 


E(d) < E x (r U) ) 


Since E{d) < E x {r {i) ) for all -c in S' 

E(d ) < 

m—1 

(to — 1 )E{d) < 2 F(r(,>) < E{L). 


It only remains to evaluate E(d), the expected distance from the center of a 
circle to the nearest of (to — 1) random points This can be done very easily 
by substituting in the expression for G(r ): 


A = M(C), 



wr = M{C r C), when r < p = 
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to give: 

G(r) - 1 - ~ ^ 

G'(r) = ~ (m - 1) | 


rP 


E(d) = rG'(r) dr = 

Jo 


where B(m, 2 ) is the complete Beta function. 
Since V» [B(m, I)] > \/V: 


'j I7I-I 

J ’ 

A - irr 2 \ m ~ 2 
A f ’ 

i |/4 [B(m, »1, 


m > l 



Thus, we have: 

E{L) > § y/A t =~ ■ 

y/m 

It is obvious that the development is general and applies to m random points 
in any bounded two-dimensional Borel set. However, the lower bound ob¬ 
tained will, in general, be useful only when £ is a connected region. 
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A MATRIX ARISING IN CORRELATION THEORY 1 

By H. M. Bacon 
Stanford University 

1. Introduction. In the study of time series, it is frequently desirable to 
consider correlations between observations made in different years. Let Xu, 
xa , • ■ • , x im be m values of the variable x, , expressed as deviations from their 
arithmetic mean, where x t is a variable observed in the ith year (i = 1, 2, ■ ■ • , n). 

1 A linear correlogram is considered by Cochran m his paper, "Relative accuraoy of sys¬ 
tematic and stratified random samples for a certain class of populations,” ( Annals of Math. 

Stat., Vol 17 (1946), pp, 164-177) in which p„ = 1 — If. Setting p = 1 1 — j | and L = 1/p, we 

L 

have the case considered above. 
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Let <r, be the standard deviation of x z . If we denote by r„ = r n the correla¬ 
tion of Xi with. x 3 , and if we assume the x, to be normally distributed, then 


2 = 


(2 t )" 12 


<Tl 02 


On 


VR 


exp 


1 n n 

1=1 J =1 


22 , 


x. X- 


22 o*, 


is the frequency function giving the distribution. Here R is the determinant 
| r t j | of the correlation coefficients, and R t j is the cofactor of the element 
m this determinant. 

We may make various assumptions regarding the behavior of the correlation 
coefficients over the n years. One such assumption of some interest is that the 
correlation coefficients diminish in such a way that 


= r n = I - 1 1 - j \p 

where p is a fixed positive number not greater than 2J(n — 1). Under these 
circumstances, we can evaluate R and 22„ in terms of n and p. 


2. Evaluation of R. We may let 22 (p) represent the determinant R of order 
n whose element in the ith row and jth column is r„ = r 1% — r n - hn - 3 = 
fn-s.n-i = 1 — | * - j | p where, for the purpose of evaluation, p is any real 
number. Since each two-rowed minor of Rip) is divisible by p, Rip) is divisible 
by p n_1 . Furthermore, since R{p) is a polynomial m p of degree at most n, we 
have 

Rip) = Ap n + Bp*- 1 = p n ~\Ap + B). 

If we set p = 1 and p = — 1, we find A + B = 22(1) and 22(—1) = (—l)" -1 
(—A + B) so that —A + B = (-1)” -1 22(—1). By elementary methods we 
find that 22(1) = 2”~ 2 (3 — n) and22(—1) = (—1)" 1 2 n 2 (n + 1). Hence 

A + B = 2 n-2 (3 - n) 

and 

-A + B = 2 n-2 (n + 1). 

Solving for A and B we find that 

22 = Rip) = 2 n_ y _1 [2 -in - l)p}. 

3. Evaluation of 22 tJ . Similar methods yield the following values for the 
cofactors 22,, of the elements of 22: 

22 n = 22™ = 2"~y _z [2- in- 2 )p], 

Rm = 22 33 -- 22 n -i,„-i = F-y- 2 [2-in - l)p], 

Rln = Rnl = 2 n ~y~ l , 

22,, + i = -2^p~\2 -in- 1 )p], 

otherwise, 


Rij = 0 . 
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4. The frequency function. The quadratic form appearing in the exponent 
in the expiession for the frequency function can now be written as 


leal Jpal 


tl/J 

Rffi a 3 


2 — {n — 2)p (x i 

of o“/ 

*L.) 


2p[2 - (n - l)p] V 
1 
V 


+ ^ + 


2 

£3 

_.2 


+ 


+ 


1 


2[2 - (» - l)p] Vi 
1 

l — -I- 

ff 2 cri 


(xiX n X, 
On o n 


nJTl\ 
n 01 / 


. 3.-2 Xx . XjXs 

2p\oi o 2 


_ XaTa 

0~2 0"3 C 3 O’2 


+ ■■■ + 


®n£n-l\ 

O' 7 i (Tn- 1 / 

1 T 2— (» — 2)p/al , a£\ . *• vp 1 a;,+i"] 

P L2[2- (« - l)p]\of ^ alJ ^ a* tl o t o i+1 J 


+ 


_ 1 ($1 

2 — (n — 1 )p \oi o„ 


-)■ 


S. Maximum, likelihood. The expression z is the likelihood of getting a 
particular set of values of the variables nq , xi, • • • , x„ . It is often important 
to regard the r,j and the o, as parameters and to determine them so that the like¬ 
lihood will be a maximum. If we assume ox = o 2 = ■ • ■ = o„ = o, then 

_J_ / 1 v-' -R m 

^ /n ^ ^l /2 n. /jp exp s ^ 2—j 2~—‘ f )_5 


(W\Wr' 


-i ,-i 


J2o 2 


The question, in our case, now becomes, What values of p and a will make z 

a maximum for given Xi ? Necessary conditions are that — = 0 and ~ = 0. 

dp 9o 

Since 12 , 3 and 12 arc given in terms of p, the process of differentiation can be carried 
out (first take the logarithm of z), and values of p and a- necessary for a maximum 
determined It is, of course, possible that z has no maximum, and the sufficiency 
of these values must be tested. The computations for the general case are 
laborious, though straightforward. Furthermore, because of the complicated 
nature of the coefficients in the equation to be solved for p, the general solution 
is not readily obtainable. This equation is, however, of third degree, and it can 
be solved in any particular case. 


TABLE OF NORMAL PROBABILITIES FOR INTERVALS OF VARIOUS 
LENGTHS AND LOCATIONS 

By W J. Dixon 

University of Oregon 

1. Introduction. The probability associated with a particular finite range of 
values is often desired. The usual tables of normal areas gives values for f or 
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as in the table by Salvosa [1], J The WPA table [2] gives J . The author 


/iX-f-j l 

has deposited with Brown University a table of / for values of x[0( 1) 5 0] 

Jx-\l 

and values of Z]0( 1) 10.0]. The values an the table may be interpreted as the 
probability that an observation from a normal population with unit variance 
will fall in an interval of length l whose midpoint is a distance x from the mean. 
These values can be obtained by a simple computation from the existing tables. 
Since values were being used frequently, the present table was constructed. 
Microfilm or photostat copies may be obtained upon request to the Brown 
University Library 


2. Computation. The values were obtained by finding the difference between 


the integrals j 


and 


f x l~ 

■CO 


as given to six decimal places m Salvosa’s table. 


Being differences, the values are subject to an error of 1 unit in the sixth place, 
For values of * + greater than 5, the values can be obtained by computing 


1 


px-hl 

J— OO 


The search for errors was aided by computing column sums, i e 


( 1 ) 


m I r« 

2-j + 7 ) / = -5 n, 


where i represents the row number and n represents the column number. For 
example, n = 17 corresponds to column for l = 1.7 The approximation becomes 
poorer as n increases but the sums were still useful for checking purposes. 


3. Example. The table has been used m studies of the expected pioportion 
of a line covered by intervals dropped on it according to some normal probability 
function. Let P„(x) be the probability that the point x is covered at least once 
when n intervals are dropped on the x-axis H E Bobbins [3] gives the ex¬ 
pression : 

(2) E(F) = jf P n CO dx, 

for the expected proportion of a line of length L covered at least once by these 
intervals 

Let f{x) dx be the probability that an interval falls with its center in dx and l 
be the length of the interval The probability that a point x will be covered by 
one interval dropped on the T-axis is: 

pZ+il 

(3) g(x) = f(t) dl 

•Jx— £ l 

When n intervals are dropped, the probability that x is covered at least once is ■ 

(4) Pn (X) = 1 - (1 ~ ff(*))", 
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and 

(5) E(F) = 1 - jjf(l - g(.v)) n dx. 


When k group!? of n, intervals are dropped according to, say normal distributions 
with different means, 

(0) Pn(x) = i - ri (1 - g *(*)) B< 

tool 

Where 


(7) 


fX+il 

Qt(x) = / /»(<) dt 

Jz-\l 


and we obtain 

(8) E(F ) = 1 - y f n (1 - *.(*))"’ dx. 

JL/ JQ teal 


The values g(x) are those given in the table and are useful m evaluating the 
integrals in (5) and (8) by numerical methods. 
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CORRECTION TO “A NOTE ON THE FUNDAMENTAL IDENTITY OF 
SEQUENTIAL ANALYSIS” 

By G E. Albert 
University of Tennesse 

In the paper cited in the title (Annals of Math. Stat., Vol. 18 (1947), pp. 593— 
596), the proof of Lemma 3 is incorrect. The following correct proof is due to 
Mr. C, R. Blyth of the Institute of Statistics, University of North Carolina. 
It is easy to establish the equation 

Pin = N\FMlo)T N = Pin = iV|(r)E n _ w [exp( - UZ^)\G\, 

where E„„ N {u\G) denotes the conditional expectation of u under the condition 
that n = N for any fixed integer N. By Wald [2], equations (2.4) and (2.6), 
there exists a finite constant C independent of N which dominates the expected 
values E n „iv[exp(— kZ N )\G\ for every N Thus 

(A) Pin = N\F)[<p(t 0 )T N £ C-P(n = N\G). 
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By Stein’s theorem [3], there is a positive number fi such that E(exp nk\G) is 
finite But by (A), 

E{ex p n[k — log<p(i 0 )]} S C'-fi(exp nfi|G), 
and Lemma 3 is proved. 


CORRECTION TO “ON THE CHARLIER TYPE B SERIES” 

By S. Ktji/lback 
George Washington University 

In the paper cited m the title (Annals of Math. Stat , Vol 18 (1947), p. 575), 
the phrase “so that . . . Ri> 1” on lines 5 and 6 should be deleted I am grate¬ 
ful to Prof. Ralph P Boas, Jr for calling this to my attention. 
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1. Estimation of Parameters for Truncated Multinormal Distributions. Z W 

Birnbaum, E. Paulson and F. C. Andrews, University of Washington, 

Let X( N ) = (Ah , ■ • X„ , A',h i , ■ • , X N ) be an A 7 -dnnen8ional random vanablc with a 
lion-siiigular normal distribution, and let the expectations, variances and covariances of 
A' P+ i , ■ • X N he known. A large sample of A r ; A .) is available, obtained undei some side- 
condition on (M„ + i, ■ , Xy ), this side-condition may be a truncation of any land oi, more 

generally, a selection, i e. imposing on „Y,> + i , • , X N a probability-distnbuiion different 
Irom the original marginal distribution A method is developed for estimating, from such a 
large sample with a side condition, all the missing paiameters of the original distribution of 
A r ( lV ) , that is the expectations, vnuanccs and covariances of X t , ,I P , and the eo- 
vauances nX t X k iovj = 1, , pand K = p +1, ■ • ■ ,N This method does not require the 

knowledge of the side-condition (This paper was prepared under the sponsorship of the 
Office of Naval Research ) 

2. A Test of the Hypothesis that a Sample of Three Came from the Same Normal 
Distribution. Caul A- Bennett, General Electric Company, Hanford 
Works, Richland, Washington. 

In the control of the precision of chemical analyses performed in duplicate, a test some¬ 
times becomes necessary as to whether three determinations can reasonably bo assumed to 
have aiiscn hom the same normal population. A critical legion for testing this hypothesis 
is given by It > Ro, where R = D/d, D being the maximum and d the minimum difference 
between the thiee values, and If« is determined by integration over the upper tail of the 
Cauchy distribution. It can easily bo seen that this test is equivalent to a Atcst between a 
sample of one and a sample of t\\ o 

3. A Note on the Application of the Abbreviated Doolittle Solution to Non- 
Orthogonal Analysis of Variance and Covariance. Carl A. Bennett, 
General Electric Company, Hanford Works, Richland, Washington 

S S Wilks has shown that the sums of squares necessary to the tests commonly made in 
non-orthogonal analyses of variance or covariance can m general be reduced to the ratio of 
two determinants If several determinantal operations are pel formed to remove the 
singular principal minors, the abbreviated Doolittle solution yields these sums of squares 
directly A combination of this technique and the calculational methods advanced by 
Wald and Yates greatly reduces the tedium of calculation in this type of analysis 

4. Yield Trials with Backcrossed Derived Lines of Wheat. G A Baker and 
F. N. Briggs, University of California, Davis. 

Strains of White Federation 38 and Baart 38 Wheats derived by backcrossing sufficient 
to insure a high degree of homogeneity for all genetic factors were grown in conventional 
yield trials The lesults were somewhat contradictory and led to a critical examination of 
such trials The assumption that the deviations of yields in field trials from the specified 
pattern are random with uniform variance and expectation zero is not sufficiently realistic 
We are led to consider a mathematical model which assumes a set of fertility levels upon 
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which a random element is supenmposed On the bams of this model it is possible to 
account foi the low observed correlations between residuals and plot yields. In such a 
model the vauance ratio F may be approximately unbiased but then its variance is smallor 
than under conventional assumptions On the other hand, the expected value of F may be 
greater than one and sufficiently large so that “significant differences” between strains will 
always be found due to the differences in fertility levels In such cases the results of the 
experiment may be misinterpreted Transformations, in the ordinary sense of the word, 
will not bring such data into conformity with the conventional model In order to bring the 
correlation between residuals and plot yields down to a sufficiently low level it is necessaiy 
to concentrate most of the vaiiation in fertility levels into a few plots. That this is not 
uiucasonable is borne out by agionomic observations. This model also explains the 
absence of coi relation between the yields of strains as determined in two different trials on 
the same set oi strains 


5. The Selection of the Largest of a Number of Means. Charles M. Stein, 
University of California, Berkeley. 


Suppose X„ , i — 1, • , J>; j = 1, 2, • • are independently normally distributed with 

means £, + ij, and varianecs <r? where , i;, are unknown but a) ate known, c, a aie fixed 
numbers with 0 < «, 0 < a < 1. It is desiied to select, by a sequential procedure, in which 
we take fust the observations with second subsenpt 1, etc. an integer M among 1, ■ , p 

such that, for every k = 1, . , p and ti, • , ip , , satisfying Si £ lj + e for all 

j =/ k, P {M = k] g 1 - a In accordance with the following rule, one decides at each stage 
(after the observations with second subscript n) to take no more observations with certain 
first subscripts For each n = 1, 2, • and each l = 1, , p compute 


E 


2 


7=1 ^7 



- £ - 


t(lj - 1 ) ^ 


wheic X, is the aveiagc of the obsetvations with second subscript j and i, is the numbei of 
such observations Continue taking obsetvations Xi ,„ t1 for those l for which this 
expression ib greater than ( lna)/e but not for the otheis Eventually there will be at most 
one subscript l = 1, • • ■ ,p for which one continues t,o take observations and if there is one 
this is chosen to be M If there is none, the l for which the sumislargest is chosen to be M 
This piuceduie is a straight-forward application of the Lemma on p 146 of Wald’s Sequential 
Analysis, and generalizations can easily be found. 


6. The Effect of Inbreeding on Height at Withers in a Herd of Jersey Cattle. 

W C. Rollins, S. W Mead, and W. M Regan, University of California, 
Davis. 

The data consist of measurements of height at withers of about 200 females for various 
ages from one month to five yeais The intensity ot inbreeding as measured by Wright’s 
coefficient of mbi ceding averaged 15 per cent and reached as high as 44 per cent in a few 
eases 

An intra-sire covariance analysis of height and pel cent of inbreeding was made for 
various ages from the fust month to the fifty-fourth month 

The results of the statistical analysis indicate that the inbieci animals are shorter at one 
month of age and grow more slowly up to about the sixth month than do the outciossed 
animals, but that from the sixth month onthembreds begin to catch up with the outciossed 
so that at maturity there is no significant difference in height 
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7. An Example of a Singular Continuous Distribution. Henry Scheffe, 
Univeisity of California at Los Angeles. 

Simple and “natural” examples of singular continuous probability distributions are of 
pedagogical interest They are trivially available in the /c-variate ease for k > 1, A 
univariate example may be obtained from the notion of a sequence of independent trials of 
an event with constant probability p of success, a notion familiar to the Btudcnt and indis¬ 
pensable in elementary probability theory. The (real-valued) random vanable X is taken 
to be the dyadic representation of the sequence of results (1 and 0, respectively, for succobs 
and failure) It is known that X has a singular continuous distribution for p ^ 0, J, 1 
This result may be proved by using only the Tchebycheff inequality together with the 
formulas for the mean and variance of the binomial distribution; 

8. On. the Theory of Some Non-Parametric Hypotheses. Erich L. Lehmann 
and Charles Stein, University of California, Berkeley, California. 

For two types of non-parametnc hypotheses optimum tests are derived against certain 
classes of alternatives Tho two kinds of hypotheses ai e related and may be illusti ated by 
the following example' (1) The joint distribution of the variables Xi, • • • , X m , Fi , ■ , 
Y„is invariant under all permutations of the variables; (2) the variables aic independently 
and identically distributed. It is shown that the theory of optimum tests for hypotheses 
of the first kind is the same as that of optimum similai tests for hypotheses of tho second 
kind. Most powerful tests are obtained against arbitrary Bimple alternatives, and in a 
number of impoitant cases most stringent tests are derived against certain composite 
alternatives For the example (1), if tho distributions aie restiicted to probability densi¬ 
ties, Pitman's test based on y — £ is most powerful against tho alternatives that tho X’s and 
Y’s are independently normally distributed with common variance, and that E(X,) - E, 
E(Y,) = r\ where n > (■ If y — £ may bo positive or negative the test based on | y — x | 
is most stringent Tho definitions are sufficiently general that the theory applies to both 
continuous and discrete problems, and that tied observations present no difficulties. It is 
shown that continuous and discrete problems may bo combined Pitman’s test for example, 
when applied to certain discrete problems, coincides with Fisher’s exact test, and when 
m = n the test based on | y — x | is most stringent for hypothesis (1) against a broad class of 
alternatives which includes both discrete and absolutely continuous distributions 

9. Concerning Compound Randomizationinthe Binary System. John E. Walsh, 
Project Rand, Santa Monica, California 

Consider a set of binary digits. The numerical deviation from $ of the conditional 
probability that a specified digit equals 0 is called the bias of that digit foi tho given condi¬ 
tions on the remaining digits of the set. The maximum bias of the sot is defined to be the 
maximum of tho biases of tho digits of tho set, A set of binary digits is called random if its 
maximum bias is zero Now considci an array of (1 + fi) ■ ■ ■ (1 + Ik) X n binary digits 
Buch that the rows are statistically independent A compounding method of obtaining a 
sot of ii ■ ■ t K n binaiy digits from the original array is presented. By suitable choices of 
K, <i, , Ik the maximum bias of the compounded set can be made extremely small oven 

if the maximum bias of the original array is not small; this can be done so that li • ■ Ik! 
(1 + b) • • (1 + Ik) is moderately large. Also a method is outlined for constructing an 
approximately random binary digit table This table has the property that the maximum 
bias of a set of digits taken from the table is an increasing function of the number of digits 
in the set. 
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10. A Multiple Decision Problem Arising in the Analysis of Variance Edward 
Paulson, University of Washington, Seattle. 

In some applications of the analysis of variance, a proceduie is required for classifying 
vanetiesinto 1 superioi ’ and‘inferior’ groups, Consider K varieties, with x ia the a' h observa¬ 
tion on the i lh variety (a = 1, 2, r, i = 1, 2, K), let i, = 2 x la /r and let s 2 be an 

«“ 1 

independent estimate of the vanance For the i lh variety form the conesponding interval 
/ As As \ 

\ _ Tf ’ ^ le Superl0r grou P then conslsts the variety with greatest 

sample mean, together with those varieties whose corresponding intervals have at least one 
point in common with the interval for the variety with the greatest mean If all varieties 
fall into one group, this group is labeled‘neuttal’ and the varieties are considered homoge¬ 
neous, To select A, consider the lelative importance of different incorrect, classifications 
For a given A, an explicit expression is found for P (A), the piobability the varieties will not 
all be classified m one group when m, = m, = = mi, where m, = E(S,) , also explicit ex¬ 

pressions are found for P(Bi) and P{Bi), where P(Bi) is the probability theic will not be a 
superior group consisting only of the Kth variety and P(P 2 ) is the piobability there will 
not be a superior group consisting of at least the Kth variety, when mi = mt - ■ • - nu-i = 
m and mi. = m 4 A(A > 0) Similai results are obtained for classifying K processes 
according to their variances 

11. Recurrence Formulae for the Moments and Semi-variants of the Joint 
Distribution of the Sample Mean and Variance. Olav ReiersjzSl, University 
of Oslo, Norway. 

Let Xi , Xi , • , x„ be independent and having the same distribution We consider the 

arithmetic mean m and the variance v = (1 /(« — l)) 2 (s, — m) 2 . Let x rt denote the semiu- 
vanants of the joint distubution of m and v, and let the seminvanant generating operators 
IC be defined by the equations- x r+ j„ = Kik ts , <tr..+i = KiK r , K , 1 = 0, If,(PQ) = P{K,Q ) + 

Q{K,P) An operator which operates only on the fiist factor of a pioduct shall be 

denoted by a prime, and an operator which operates only on the second factoi shall be 
denoted by a double prime Wo have the following general foimula, valid for any parent 
distribution K\[{n — 1)(K S 4- kqi + K oi) ~ 2 n(K[ 4 «io )(&[' + «io)UU ( K °i — Wk 2 o) + 
n(xiomo - 1-Kio)] = 0. Foi s = 0, 1, 2, we obtain the formulae, = 0, 

K\[{.n — 1)(k 0 2 — UK 2 l) — 2n 2 K2(|] = 0, K[[(n — 1) Z («03 ~ UKn) — Sa 2 (7l — 1 )k;iK 20 + 47i“(n — 1) 

k„„ - 8n 3 (n - 1)«J 0 ] = 0. 

12 The Problem of Identification in Factor Analysis. Olav Reiers^l, Uni¬ 
versity of Oslo, Norway. 

The paper is concerned with the multiple factor analysis of L, L, Thurstone Thurstone 
has given criteria which he says arc almost certain to constitute sufficient and more than 
necessary conditions for uniqueness (i e ldcntifiability) of a simple struetuie It is shown 
that Thurstone’s cuteria are not always sufficient, and conditions are derived which are 
moio neatly necessary and sufficient for the identifiability of a simple structure. Let A 
be the matrix of factor loadings with n rows and i columns. When the communahties aie 
identifiable, the conditions will be. (1) Each column ol A should have at least r zeros. 
(2) Lot us consider the submatrix B of A, consisting of all the rows which have zeros in the 
leih column Then, for q = 1,2, •, r — 1, there should for any combination of q columns 

different fiom the kth, exist at least g 4 1 rows of B containing non-zero elements in the q 
columns This should be true foi any value of k 
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13 Note on Distinct Hypotheses. Agnes Berger, Columbia University, New 
York. 

As was pointed out by Neyman, one of the difficulties which one may encounter when 
devising a test to distinguish between two exhaustive and exclusive composite hypotheses 
leforiing to the unknown distribution of a random vector X is the following 1 If I[ 0 states 
that the tiue distribution function of X belongs to a Bet (I 1 ') and H 1 that it belongs to a 
set (G|, it may happen that to every Borel set W of the sample space thci e exists an element 
F w in (E) and on element Gi r in ((71 for which llio piobnbihiy of the sample point x falling 
on W is the same and therefore independent of whether II { , or II i is true. If this is the case 
the pair II, , Ih is called non-distinct, otherwise they are called distinct. The existence of 
non-distinct hypotheses is demonstrated by a simple example, lit consisting of one, Hi 
of throe suitably choson stepfuneLions It is shown however that if the sets (F) and ((?] 
contain only continuous distribution functions ami are at most enumerable then the pair 
II i , IIi is distinct. Necessary and sufficient conditions for IL, and Hi to bo distinct were 
obtained jointly with Wald for an impoifcant class of hypotheses each containing a con¬ 
tinuum of alternatives, 

14. Place of Statistical Sampling in the Education of Engineers. E. L Grant, 
Stanford University. 

There is convincing evidence that many engineering problems could bo solved better 
with the aid of statistical methods than they are now solved without this aid However, 
few practising engineers oi teachers of engineering have had any training in statistical 
methods. As a result, those engineering problems which are in part statistical problems 
are seldom recognized as such, Even in the field of industrial quality control, in which 
successful applications of some of the slmplei statistical techniques have been made in 
many different industries, the surface has barely been scratched and a sonous obstacle to 
progress is the lack of a widespread appreciation of the statistics point of view among 
design engineers, production engmeeis, inspection personnel, and management. 

This condition might gradually be corrected if during the next few ycais instruction in 
statistics should bo introduced into all undergraduate engineering curricula. However, 
some recent discussions touching on tho subject of statistics instruction for engineering 
students (e g , the report on “The Teaching of Statistics” which appeared in the March 
1948 issue of the Annals of Math Stat ) have been most unrealistic lcgauhng tho amount of 
statistics instruction which could be added to engineering curricula Those discussions 
have suggested a full yeai of basic statistics followed hy ono or more courses m engineering 
applications Desirable as this arrangement might bo from the point of view of the most 
effective instruction m statistics, it is out of the question when consideicd m the light of 
the many subjects which are needed in engineering curricula Although undergraduate 
engineering curricula have always been tighter than other curricula, the pressures today 
are greater than ever before—for moic time devoted to tho humanistic-social stem, foi 
more time in basic mathematics and science, for introductory courses in vauous economic 
and management subjects such as engineering economy, accounting, industrial relations, 
business law, and industrial organization and management, and for more time in the various 
departmental courses m engineering subjects in order to permit presentation of important 
recent developments m engmcciing technology Under these eireumstanees the most that 
can bo hoped for in the undergraduate program ib a single statistics course for ono term, 
possibly three units for one semester or four units for one quarter. This should be supple¬ 
mented by additional statistics instruction for some graduate students m engineering 
A few engineering graduates should be encouraged to take graduate degrees m statistics 
and to make careersm the field of applied statistics. 



ABSTRACTS OR PAPERS 


433 


In a successful undergraduate statistics course foi engineering students, the problems 
and illustrations should be selected with two puiposes in mind One purpose, of course, 
should be to develop the principles of probability and statistical method The other, 
equally important if these engineering giaduatcs are to persuade their colleagues and 
superiors to adopt the statistics point of view in approaching engineering pioblcms, should 
be to demonstrate how statistical method provides a useful guide to action in many different 
engineering situations Applications of statistics to mdustiial quality control provide 
particularly good problems and illustrative examples which seive this second puipose 

15 Statistical Problems of Medical Diagnosis. Jerzy Neyman, University of 
California, Berkeley 

“Diagnosis” is used to dcsciibo the outcome of a stnctly defined test T , such as Wasser- 
mann test, which may lead to either of two possible outcomes, “positive” or "negative” 
Cases contemplated are such that at the time the test T is perfoimed it is impossible to 
verify its verdict for certain and the best one can do is to repeat the test It is postulated 
that to each individual of a population there corresponds a probability p that the test T 
will give a positive outcome The value of p may vaiy from one individual to another 
It is piesumed that as p increases, the illness m the patient increases. Problem of compan- 
son of two alternative tests and problem of estimating the distribution of p reduces to 
pi oblcms relating to the disti lbution of X = number of positive outcomes in n independent 
diagnoses Statistical machmciy suggested is that of BAN estimates ( Public Health 
Report, Vol. 62, (1047), p 1449) Principal result reported is that, with the mathematical 
model used in the paper quoted, the empirical variances of foui BAN estimates computed 
for 205 samples of 1000 elements each agreed leasonably with the tlieoictical asymptotic 
values Empiucal distributions of three of these estimates did not show deviations from 
normality That of the fourth was non-normal. It seems theiefore that the asymptotic 
procedure of BAN estimate may be adequate for similai analyses. 

16. Power of Certain Tests Relating to Medical Diagnosis. C L. Chiang and 
J. L. Hodges, Jr., University of California, Berkeley. 

Associate with each individual in a population ir the piobability p that he will bo found 
tubercular when examined by a standard X-i ay technique. Yerushalmy and others [ J Am 
Med Assn., Vol 133, (1947), p. 359] performed 5 independent such diagnoses on each of 1266 
persons Neyman [Public Health Reports, Vol 62, (1947), p 1449] proposed a simple foui- 
parameter model for the distribution of p in ir, estimated the parameters from the data of 
Yerushalmy and others, and obtained a satisfactory fit In the present paper, the work of 
Neyman is paralleled with four new models, all giving satisfacloiy fit with the same data 
The five models differ considerably in shape, and in the number of lepeatpd diagnoses which 
they indicate to be necessary to detect a high proportion of those individuals having, say, 
p ^ 0 1 Therefore further preliminary study seems indicated before one can design a mass 
survey to detect a high proportion of such persons The approximate power of the x 2 test 
of the Neyman model is considered, using one of the other models as alternative. It is 
found that to obtain power 0.7 with level of significance 0 05, it would be necessary to diag¬ 
nose 5290 individuals 5 times each. 

17. Iterative Treatment of Continuous Birth Processes. T. E. Harris, 

Project Rand, Santa Monica, California 

Random vauablcs s„ aie defined by So = 1, P( z i = r) = p r , r = 1, 2, • ■ ; if s„ = k, 
2 , 1+ i is the sum of k independent vanates, each distributed like Si Let x = 2 rp, < « ; 
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r ! p r < ® ;0 < pi < 1. The genciating function/(s) = E p r s r is said to be C.I if theie 

exists a family of generating functions/(s, t) \vith/(s, 1) = /(s),/[/(s, l),t'} = /(s, tt‘) for all 
nonnegative i and /' A necessary and sufficient condition that f(s) be C I is that the 
numbers a r , r = 2, 3, • • , be nonnegativc, the a r are determined recursively by requmng 

00 

that the power scries £(s) = -s + E a r s r satisfy formally the functional equation £ (s)f(s) = 


£[/($)] The problem is connected with classical works on iteration If f(s ) is C.I.,the 
given Markoff process can ho imbedded in a continuous birth process. If £(s) is given, the 
m.g f. 4>(s) of the asymptotic distribution of the variate z„/x' 1 may be determined from the 


f r" r£'(1) 1 ”) "1 

- 1) exp 1 + ;- dy > , Various properties 

\Ji 1 ~ 1/J } 


formula <A _1 (s) = (s 
responding distribution can be inferred from this expression 


of the i 


18. Estimation of Means on the Basis of Preliminary Tests of Significance. 

Blair M. Bennett, University of California, Berkeley. 


This paper examines the statistical procedure of pooling two sample means on the basis 
of the results of one or moie preliminary tests of significance. Let x,, (i = 1, , N i), 

lepresent a sample of Mi observations from a normal population in(£, ff i)> And j/, a sample of 
Ni observations from 772 ( 77 ,trj) A 11 estimate of £ which is commonly used in certain practical 
situations is given by x' = £, or x' = (Ni% + Nty)/(Ni + Ni), accoiding as the sample 
means x, y do oi do not differ significantly on the basis of a preliminary test The distribu¬ 
tion of the estimate x' is determined, according as <r\ = erj are known 01 unknown In both 
situations, the maximum (or minimum) bias is computed as a function of various levels of 
significance of tlio preliminary tost of equality of moans Also, the mean square error of 
the estimate a'is calculated m both cases. If now equality of variances cannot be assumed, 
but an F-tcst of the sample variances sj , s a docB not indicate any significant difference, 
then in practice x, y may be pooled, the weights being inversely proportional to the sample 
variances. Thus, the usual estimate of £ will be of the form 1 x' = 2, or x' = (N ix/s, + 
Niy/S])/(Ni/Sl + Nt/Sl), according as 2 and y do or do not differ significantly on the 
basis of the Student f-test, subsequent to an F-test. Tho bias and mean square erroi of this 
estimate have been computed with the aid of the conditional power function of the f-test 
subsequent to an F-test. 

19. Note on Power of the F Test. Stanley W. Nash, University of California* 
Berkeley. 


Assuming “treatment” expectations to be normal random variables, the ratio of the 
sum of squares due to tieatments to the sum of squares due to error has a central F dis¬ 
tribution. in. the cases of randomized blocks, Latin squares, and one-way classifications. 
The F statistic converges in probability to a constant as the number of treatments is in¬ 
creased. This is one plus a multiple of tho variance between treatment expectations. 
The power of the F test incicascs monotonely to ono as the number of treatments is in¬ 
creased. This power can be calculated using tables of the incomplete beta function. 


20. Best Asymptotically Normal Estimates. E W. Baranicin and J. 
Gurland, University of California, Berkeley. 

The methods of minimum x 2 developed by Neyman for obtaining BAN (best asymp¬ 
totically normal) estimates ot the parameters appealing in the multinomial distribution 
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are generalized to obtain certain optimum types of estimates in the ease of an arbitrary 
distribution under certain restrictions Let the random vector X have the probability 
density v (x, 6) in the absolutely continuous case and let«(a;,0) = P[X = x/0] mthediscrete 
case, where $ is a fixed vector in the paiamcter space. Functions <t>,(X), (t = 1,2, ■ , r) 

are selected for the purpose of foiming estimates, these estimates are taken to be functions 
of the sample moments — x Certain quadiatic forms which depend on the 

choice of functions <#>i(X), <t> 2 {X), ■ , <j> t (X) are minimized with respect to the parameters. 

In this manner, asymptotically normal estimates aie obtained which are consistent, and 
have minimum asymptotic variances within the class of estimates so determined by the 
particular functions fa, fa, • , <j> T ■ It is possible, thiough a modification of this proce- 

duie, to obtain estimates by solving a set of linear equations. If v(x, 6) has the form 

9 

v(x, 6) = exp (/3o(0) + 53/3.(9)y.(a:) + yo(x)| 

i=l 

it can he shown that the best choice of the <£’s is yi{x), Vi(x), , y,{x). Maximum likeli¬ 
hood estimates belong to this class of BAN estimates. 



BOOK REVIEW 


The Theory of Games and Economic Behavior John von Neumann and Oskar 
Morgenstern. Princeton University Press, 1947, Second Edition, Pp xviii, 
641 $10.00 


Reviewed hy Leonid Hurwicz 1 
Iowa State College 

This review is devoted to the second edition of a book which from its first 
appearance was acknowledged to be a major contribution in the field of theory 
of rational behavior. As is pointed out in the Preface, "the second edition 
differs from the first in some minor respects only”. The main change is the 
addition of a proof (of "measurability” of utility) omitted in the first edition. 

The book’s objective is to solve the problem of rational behavior in a very 
general type of situation. 

It is, therefore, not surprising that its lesults are of relevance in many fields 
of knowledge, among them economics and statistical inference. 

In both economics and statistics the problem of rational behavior is a funda¬ 
mental one. Thus one of the classical problems treated by the economic theory 
is that of profit maximization by a firm. The firm is assumed to be maximizing 
its net profit which is a function of prices of the product, materials used, etc., as 
well as the quantities used and produced. In the simplest case prices are taken 
as given, more generally they are assumed to be functions (known to the firm) 
of the quantities sold and purchased. But assuming this function to be known 
presupposes the knowledge of behavior of other firms, This procedure has for 
a long time been regarded as highly unsatisfactory, it is analogous to elaborating 
the theory of rational behavior of a poker player on the assumption that he knows 
the strategy of the other players 1 

It is the type of situation in which not only the behavior of various individ¬ 
uals, but even their strategies, are interdependent, that is treated by von Neu¬ 
mann and Morgenstern, The essence of their solutions is to base the optimal 
stiategy on the mimmax principle. As applied to a game, the principle re¬ 
quires that one should choose a strategy which minimizes the maximum loss 
that could be inflicted by the opponent. 

The mimmax principle, when applied by both players need not, in general, 
lead to a stable solution. To ensure the existence of such a solution the authors 
are led to the postulate that the choice of strategies be made through a random 
process The mimmax to be found is that of the mathematical expectation of 
the loss in the game. The latter postulate is of a restrictive nature 5 since it 
implies that the game is played for numerical (“measurable”) stakes and that 

1 On leave to the United. Nations Economic Commission for Europe. 

2 See Jacob C Marsohak, “Neumann's and Morgenstern’s New Approach to Static 
Economics”, The Journal of Political Economy, Vol LIV (1946). 
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the second and higher moments of the probability distribution of the losses are 
immaterial. This restriction, however, has pennitted the authors to go deeper 
m other directions, Given the great complexity of the problem, even m its 
restricted version, the authors’ decision can hardly be criticized. One could 
only wish that similar considerations had made the authors more tolerant towards 
other work m the field of economics than is shown m some sections of the book. 

The readers of the Annals will be particularly interested in the connection 
between the Theory of Games and the theoiy of statistical inference 

As has been pointed out by Abraham Wald J the problem faced by the statisti¬ 
cian is somewhat similar to that of a player in a game of strategy. The theory 
of statistical inference may be viewed as a theory of rational behavior of the 
statistician. His “strategy” consists in adopting an optimal test or estimate, 
more generally an optimal decision function. This optimal decision function 
must be chosen without the knowledge of the “a prion” distribution of the pop¬ 
ulation parameters. Wald’s basic postulate of minimization of maximum risk 
is equivalent to regarding the statistician as a player in a game of strategy, with 
“Nature” as the other player. The optimal decision function is chosen m a 
way which (as shown by Wald) is equivalent to assuming the “least favorable” 
a prion distribution of the parameters As Wald says, “we cannot say that 
Nature wants to maximize [the statistician’s risk], However, if the statistician 
is completely ignorant as to Nature’s choice, it is perhaps not unreasonable to 
base the theory of a proper choice of [the decision function] on the assumption 
that Nature wants to maximize (the statistician's risk)”. 

It may be noted, however, that statistical inference, as seen by Wald, is a 
relatively simple game since it involves only two players and is of the zero-sum 
variety. 

The admiring and enthusiastic reception given to the book’s first edition would 
make any further general appraisal somewhat anticlimatic Suffice it to say 
that a good deal of valuable work has already been stimulated by the Theory of 
Games, both in the field of social sciences and in mathematics, 

3 Abraham Wald, “Statistical Decision Functions which Minimize the Maximum Risk”, 
Annals of Mathematics, Vol 46, (1945), 



NEWS AND NOTICES 

Readers are invited io submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr, Paul H, Anderson, formerly an Economist with the War Assets Adminis¬ 
tration, Washington, D. C., has been appointed Professor of Marketing at 
Loyola University, New Orleans, Louisiana. 

Mr. N. H Carrier has resigned his position with the Mathematical Statistics 
Section, Chief Scientific Advisers Division, Ministry of Works, England to ac¬ 
cept an appointment as Statistician in the General Register Office, Somerset 
House, Strand, London, W. C. 2, England. 

Dr. T. Freeman Cope has been promoted to a full professorship at Queens 
College, Flushing, New York 

Dr. Wayne W. Gutzman, who was formerly at the Postgraduate School, 
Naval Academy, Annapolis as an Assistant Professor, has accepted a professor¬ 
ship in the Department of Mathematics, University of South Dakota. 

Mr. Elvin A Hoy has transferred from the position as Chief, Statistics Sec¬ 
tion, Bureau of Research and Statistics in the Social Security Administration to 
the position as Chief, Research Evaluation Section, Naval Reserve Training 
Publications, Navy Department, Naval Gun Factory, Washington, D. C. 

Dr Joe J. Livers has been promoted to a full professorship at Montana State 
College, Bozeman, Montana. 

Professor Ernest S Keeping has returned to his position at the University of 
Alberta, Edmonton, Alberta, Canada after having spent the spring term of 1948 
at the Institute of North Carolina 

Mr. Wharton F. Keppler of the M&R Dietetic Laboratories, Inc , Columbus, 
Ohio has recently qualified as a Professional Industrial Engineer in the State of 
Ohio. 

Mr. Ralph Mansfield has formed his own company to manufacture electrical 
testing equipment. The company is known as the Auto-Test, Incorporated, 
with Mr Mansfield acting as Vice-president and Chief Engineer. 

Mr Jack Moshman has resigned an instructorship m mathematics at the 
University of Tennessee to accept the post of Statistician to the Medical Advisor, 
United States Atomic Energy Commission, Oak Ridge, Tennessee. 

Mr. Bernard E. Phillips has resigned his position as teacher of mathematics 
in the Newark, New Jersey high schools to do statistical work for the Glenn L. 
Martin Co , Baltimore, Maryland 

Dr. W. R. Van Voorhis, Associate Professor of Mathematics, Fenn College, 
attended, as a representative of the Institute of Mathematical Statistics, the 
inauguration ceremonies of Dr Keith Glennan as President of Case Institute of 
Technology, Cleveland, Ohio. 
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Atomic Energy Commission Fellowships 

The National Research Council is announcing a new program of fellowships 
supported by funds provided by the Atomic Energy Commission as a part of the 
Commission’s responsibility for future atomic energy research. Accordingly, 
these fellowships will be awarded m the many fields of science related to research 
m atomic energy 

A considerable number of these fellowships is available to young men and 
women who wish to continue m graduate training or research for the doctorate 
in an appropriate field of science. Others of these fellowships will provide train¬ 
ing in biophysics applied to the control of radiation hazards. An additional 
number of fellowships will be assigned to those below the age of 35 who have 
already achieved the doctorate and who wish to secure advanced lesearch train¬ 
ing and experience in those aspects of the physical, biological and medical 
sciences related to atomic energy Tenure of the fellowship does not impose on 
the fellow any commitment with regard to subsequent employment. 

The candidates will be selected by the fellowship boards of the National Re¬ 
search Council established for this program In the postdoctoral field, there 
will be three groups of fellowships, the basic stipend of which will be $3000. For 
the selection of fellows for advanced research and training m the general field of 
the physical sciences, a boaid has been established with Dr. Roger Adams, 
Professor of Chemistry, University of Illinois, as chairman In the general 
field of the biological sciences, exclusive of the medical sciences, selections of 
postdoctoral fellows will be made by a board under the chairmanship of Dr. R. 
G. Gustavson, Chancellor of the University of Nebraska. For the selection of 
postdoctoral fellows in the medical sciences, a board has been set up under the 
chairmanship of Dr Homer W. Smith, Professor of Physiology, College of 
Medicine, New York University. 

The program provides for two groups of fellows in the predoctoral field, with 
stipends ranging from $1500-2100 One group of fellows will work in the bio¬ 
logical and basic medical sciences including applied biophysics related to the 
measurement and control of radiation hazards and the effect of radiation upon 
health. Selections will be made by a board under the chairmanship of Dr. 
Douglas Whitaker, Professor of Biology, and Dean of the School of Biological 
Sciences, Stanford University. Another group of predoctoral fellows will be 
selected for study and research m the general field of the physical sciences. 
Selections will be made by a board under the chairmanship of Dr. Henry A. 
Barton, Director of the American Institute of Physics. 

Fellowships will be granted for study and research in universities or other 
nonprofit research establishments approved by the fellowship boards Awards 
will be made for the academic year 1948-49. Supervision of a fellow’s program 
of work will be under the direction of the fellowship boards of the National 
Research Council. Further information can be secured by writing to the 
Fellowship Office, National Research Council, 2101 Constitution Avenue, Wash¬ 
ington 25, D. C. 
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Research Fellowships in Psychometrics 

The Educational Testing Service, Princeton, N. J., is offering for 1949-50 its 
second series of research fellowships in psychometrics leading to the Ph.D 
degree at Princeton University Open to men who aie acceptable to the Gradu¬ 
ate School of the University, the two fellowships carry a stipend of $2,200 a 
year and are normally renewable 

Fellows will be engaged in part-time research in the general area of psycho¬ 
logical measurement at the offices of the Educational Testing Service and will, 
in addition, carry a normal program of studies in the Graduate School. Com¬ 
petence in mathematics and psychology is a prerequisite for obtaining these 
fellowships. Information and application blanks may be obtained from: 
Director of Psychometric Fellowship Program, Educational Testing Service, 
Box 592, Princeton, N. J. 


Preliminary Actuarial Examinations 
Prize Awards 


The winners of the prize awards offered by the Actuarial Society of America 
and the American Institute of Actuaries to the nine undergraduates ranking 
highest on the combined score on Part 1 and Part 2 of the 1948 Preliminary 
Actuarial Examinations are as follows - 


First Prize of $200 
Edward H Larson 
Additional Prizes of $100 
John E Brownlee 
William L Farmer.. 
Joseph P. Fennell . . 
Bert F Green, Jr 

Solomon Leader. 

Felix A. E Pnam 
Richard J Semple 
Charles A Yardloy,.. 


Massachusetts Institute of Technology 

Haveiford College 
University of Alabama 
Princeton University 
Yale University 
Rutgers Umveisity 
Uiuveisity of Western Ontaiio 
University of Toronto 
Dailmouth College 


The two actuarial organizations have authorized a similar set of nine prize 
awards for the 1949 Examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three exam¬ 
inations : 


Part 1. Language Aptitude Examination. 

(Reading comprehension, meaning of words and word relationships, antonyms, and 
verbal reasoning.) 

Part 2 General Mathematics Examination. 

(Algebia, trigonometry, cooidmate geometry, differential and integral calculus.) 
Part 3 Special Mathematics Examination 
(Finite differences, probability and statistics.) 
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The 1949 Preliminary actuarial Examinations will be administered by the 
Educational Testing Service at centers throughout the United States and 
Canada on May 13-14, 1949. The closing date for applications is March 15, 
1949 

Detailed information concerning the Examinations can be obtained from either 
of the following organizations: 

American Institute of Actuaries, 

135 South LaSalle Street, 

Chicago 3, Illinois. 

The Actuarial Society of America, 

393 Seventh Avenue, 

New York 1, New York. 


New Members 

The following persons have been elected to membership m the Institute 
(March 1 to May 31,1948) 

Alder, Arthur, Ph.D (Univ. of Berne) Professor of Actuarial Science, University of Berne, 
Schlaeflistrasse 2, Berne, Switzerland. 

Andrews, Fred C. t B S. (Univ of Washington) Research Fellow, Department of Mathe¬ 
matics, University of Washington, 14-1 Savery Hall, University of Washington, Seattle, 
Washington. 

Archer, John, Actuary, Pensions Section, Lever Biothers and Umlevci Ltd , BA Spencer 
Hill, Wimbledon, S. W 19, England 

Berntz, Paul A., M.A (Stanford Univ ) 173 Serpentine Road, Tenafly, New Jersey. 

Bennett, George K., PhD (Yale) President of the Psychological Corporation, 522 Fifth 
Avenue, New Yoik 18, New York. 

Berrettoni, J. N., Ph.D (Umv of Minnesota) Professional Consultant in Statistics and 
Economics, 032 Erie St, S, E , Minneapolis If, Minnesota. 

Birnbaum, Allan, A.B. (Univ. of Calif., Los Angeles) Teaching Assistant, Mathematical 
Statistics Department, Columbia University, BOO Riverside Drive, Room 434, New 
York 27, New Yoik 

Blank, Mark, M A. (Univ of Pennsylvania) Instructor of Philosophy, University of Penn¬ 
sylvania, 228 E Sedgwick, Philadelphia, Pa 

Blumen, Isadore, B A. (Univ. of Minn ) Student, Department of Mathematical Statistics, 
University of North Carolina, Chapel Hill, North Carolina 

Burdick, Wayne E„ M A. (Umv of Mich ) Student, University of Michigan, 514 S Fifth 
Avenue, Ann Arbor, Michigan 

Chaturvedi, Jagdish C., M.Sc, (Agra Umv , India) Lecturer in Statistics, St John’s College, 
37, Delhi Gate, Agra, U P , India. 

Cote, Louis J., A.M (Univ of Mich ) Student, University of Michigan, 315 North State 
Street, Ann Aiboi , Michigan. 

Dunleavy, Mary, A B (LIunter College, New York) Statistician, E I, Dupont dc Nemours, 
657 Second Avenue, New York 16, New York 

Ferber, Robert, M.A (Umv of Chicago) Student, University of Chicago, 54 West 89th Street, 
New York 24, New York 

Forman, John W., M S (Umv of Iowa) Graduate Assistant, Depaitroent of Mathematics. 
State University of Iowa, Iowa City, Iowa. 



442 


NEWS AND NOTICES 


Franklin, Nathan M., M.S, (Umv, of Midi.) Student, Univ. of Michigan, Box 195, Moodus 
Connecticut. 

Fraser, Donald A. S., M A. (Univ. of Toronto) Instructor in Statistics, Graduate College 
Princeton, New Jersey 

Grabowski, Edwin F., A.B. (Georgo Washington Univ.) Student, George Washington Uni¬ 
versity, ISSO-SOlh Street, MW., Washington, D. C. 

Healy, William C., Jr., B.S.E. (Univ of Mich.) Student, University of Michigan, 689 Lin¬ 
coln, Orosse Points, Michigan. 

Heimdahl, Olaf E. W., A.B. (Luther College, Washington) Teaching Fellow, Department of 
Mathematics, University of Washington, 4S86 Union Bay Lane, Seattle 6, Washington. 

Henriksen, Robert 0., B.Se. (Univ. of Mich ) Student, Umveisity of Michigan, 161 Clancy 
Avenue, Grand Rapids, Michigan. 

Howard, William G., B.S. (Western Carolina Teachers Gollego, Cullowhee, N C.) Student, 
Institute of Statistics, University of North Carolina, Route 1, Mornsville, North 
Carolina. 

Irlck, Paul E., M S. (Purdue Univ.) Mathematics Instructor, Purdue University, 129 
North Grant St., West Lafayette, Indiana 

Johnson, Elgy S., M.A (Univ. of Mich.) Student, University of Michigan, 18907 Lincoln 
Street, Detroit 8, Michigan. 

Kaplan, E. L., B S. (Carnegie Inst of Tech,) Mathematician, Naval Ordnance Laboratory, 
1427 N. St., N. W., Washington 6, D. C. 

Kaufman, Arthur, M.A. (Columbia Univ.) Student and Lectuier of Mathematics, Columbia 
University, 1880 Sheridan Avenue, New Yoik 56, New York. 

Link, Richard F., B.S. (Univ of Oregon) 160 W Sixth St., Eugene, Oregon. 

Marks, Charles L., M.A (Umv of North Caiolina) Instructor of Mathematics, University 
of North Carolina, 813 Mangum Dormitory, University of Not Ih Carolina, Chapel Hill, 
North Carolina 

Marquardt, Mary, M.A. (Univ. of Illinois) Assistant Professor of Statistics, Now York State 
School of Industrial and Labor Relations, Cornell University, Ithaca, New York. 

Mickey, Max R,, Jr., B.S (Virginia Polytechnic Institute) Graduate Student and Graduate 
Assistant, Department of Mathematics, Iowa State College, 706 Ash Avenue, Ames, 
Iowa. 

Mmdlin, Albert, B A, (Univ, of California, Los Angeles) Teaching Assistant, Mathematics 
Department, University of California, 21^44 Carlslon Street, Berkeley 4 , California. 

Morns, William S., A B (Princeton) Statistician, First Boston Corporation, 100 Broadway, 
New York 5, New York 

Netzorg, Morton J., Engineer, Development Tire Engineering Department, U S Rubber 
Co , Detioit, Michigan, £628 Gladstone, Detroit 6, Michigan 

Norton, James A., Jr., A.B. (Antioch College) Graduate Research Assistant, Veterans 
Guidance Center, Purdue University, West Lafayette, Indiana. 

Perrin, John K., A B. (Columbia College) Assistant Statistician, American Telephone & 
Telegraph Co., 195 Broadway, New York 7, New York. 

Perry, Norman C., M.A (Umv of Southern Calif.) Lectuier in Mathematics, Mathematics 
Department, University of Southern California, Los Angeles, California, 

Powell, Charles Jr,, Actuary, Coates and Hevfurth, Consulting Actuaries, 116 S Virgil 
Avenue, Los Angeles 4, California, 

Raifla, Howard, B,S. (Univ. of Mich ) Student, University of Michigan, 1441 Enfield Court, 
Willow Run Village, Michigan. 

Raup, Joan E., B A (Barnard College) Statistical Analyst, Bureau of the Budget, 1438 N 
Street, N W., Washington 6, D. C. 

Rubinstein, David, B S. (Univ. of Wash ) Research Assistant, Statistical Laboratory, Uni¬ 
versity of California, 8216 Parker Street, Berkeley 4 , Califoi nia 
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Schlenz, John W„ B S. (Baldwin-Wallace College) Student, University of Michigan, 8806 
Vineyard Avenue, Cleveland 5, Oho, 

Scott, Elizabeth L., A B (Umv of California) Research Assistant, Statistical Laboratory, 
Department of Mathematics, University of California, Berkeley 4, California 

Seidman, Herbert, B.A. (Brooklyn College) Junior Statistician, Chief, Statistical Informa¬ 
tion Section, New York University and Student, New York University, 8110 New 
York Avenue, Brooklyn 10, New York. 

Shaw, Oliver A., B A, (Umv. of Mississippi) U.S Air Force, 6$1 Brooks Lane, N W., 
Washington, D. C 

Shellard, Gordon D., B.S. (Mass Institute of Tech) Assistant Section Head, Underwriting 
Studies Section, Actuarial Division, Metropolitan Life Insurance Co., 0 Mountain 
Avenue, Ridgewood, New Jersey, 

Shepherd, Clarence M., M.S (Case Institute of Tech ) Electrochemical Research Chemist, 
39S9 Nichols Avenue, S, W, Washington, B, C 

Shrikhande, Sharad-Chandra S., B Sc, (Nagpur Umv , India) Graduate student, Depart¬ 
ment of Mathematical Statistics, University of North Carolina, Chape] Hill, North 
Carolina 

Sirlin, Robert, M.A. (Columbia Univ.) Statistician, Financial Analysis, 80$ East 23rd 
Street, Brooklyn 29, New York, 

Stacy, Edney W., A B. (Umv. of North Carolina) Instructor of Mathematics, University 
of North Carolina, 301 W. Franklin Street, Chapel Hill , North Carolina 

Sternhell, Charles M., B.S (College of City of N, Y.) Section Head, Actuarial Division, 
Metropolitan Life Insurance Co , 1 Madison Avenue, New York City, New York 

Tang, Pei-Ching, PhD. (Umv. College, London Umv.) Professor, National Central Uni¬ 
versity, Nanking, China 

Whitson, Milo E., A.M. (Geo Peabody College, Nashville) Head of Mathematics Depart¬ 
ment, California State Polytechnic College, 538 Lawrence Dr, San Luis Obispo, 
California 

Watson, Geoffrey S., B.A. (Umv. of Melbourne) Student, Institute of Statistics, State 
College, Raleigh, North Carolina 

Woolsey, Theodore D., B.A (Yale Umv.) Biostatistician, Division of Public Health Meth¬ 
ods, U S Public Health Service, 111 West Underwood St , Chevy Chase 15, Maryland, 

Wymer, John P., M A (Univ of California, Berkeley) Statistician, U S Bureau of Labor 
Statistics, 119 Whittier St,NW, Washington 18, B C 

Yerushalmy, Jacob, Ph.D, (Johns Ilopkms Univ,) Professor of Biostatistics, School of 
Public Health, University of California, Berkeley 4, California. 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 


The Thirty-fourth Mooting and the Third Regional West Coast Meeting of 
the Institute of Mathematical Statistics was held on the Berkeley Campus of the 
University of California June 22nd through June 24th, 1948, in conjunction 
with the Twenty-ninth Annual Meeting of the Pacific Division of the American 
Association for the Advancement of Science. During the meeting 1] 5 persons 
registered, including the following members of the Institute- 

G. A, Baker, Blair M Bennett, Carl A Bennett, Z, Wra, Bunbaum, David Blackwell, 
Albert H, Bowlcci, George W Brown, A. George Carlton, Douglas G. Chapman, AndiewG. 
Clark, Edwin L Ciow, Dorothy Ciuden, Harold Davis, It C. Davis, W J. Dixon, Roboit 
Dorfman, George Eldiodgo, Lillian Klvclmok, Mary Elvebaek, Benjamin Epstein, M, W. 
Eudey, Evelyn Fix, Morrill M Flood, II II. Germond, Meyer A. Girshiolc, Eugene L, 
Grant, John Guiland, T 10 Harris, J,L Hodges, Jr., Paul G TIoel,JohnM Howell, Harry 
M. Hughes, Leo Katz, H. S. Konijn, T. 0 Koopmans, George W Kuznets, IC L. Lehmann, 
Itiohaid F. Link, A M Mood, Stanley W. Nash, J Neyman, Stefan Peters, G. Baley Price, 
Kathryn B Rolfe, Leonard J Savage, Henry SchelTd, Ifmvaid L Selvug, Elizabeth L, 
Scott, Esther Scvden, Milton Sobel, Zonon Szatrowski, John W. Tukey, J. II. Vatnsdal, 
A Wald, John E Walsh, Holbrook Working, Zivia S, Wurtelc, 

The Tuesday morning session was devoted to a symposium on Mathematical 
Theory of Games with Professor G. C. Evans of the University of California, 
Berkeley, as chairman. Addresses were • 

1. Survey of von Neumann’s mathematical theory of games. J. C. C. McKinsoy, Project 
Rand. 

2 Recent developments in the mathematical theory of games. Olaf Iielmer, Project Rand, 

, 3 Applications of theory of games to statistics Abraliam Wald, Columbia University 

4 On continuous games Henri E. Bohnenblust, California Institute of Technology 

5 Discussion. Edward W. Barankin, Univemly of California, Berkeley 

The attendance was approximately 130. 

The Tuesday afternoon session was undei the chairmanship of Professor Henri 
F Bohnenblust of the California Institute of Technology. The invited addiess, 
Complete Classes of Statistical DccAsion Functions, by Professor Abraham Wald 
was followed by tea in Senior Women’s Hall and then the following contributed 
papers: 

1. Identification as a problem of inference. T. C. Koopmans, Cowles Commission for 
Research in Economies 
Discussion • Olav Ilcierspl, University of Oslo 
2 Estimation of parameters for truncated multinomial distributions Z W. Bunbaum, 
E. Paulson and F, 0. Andrews, University of Washington. 

3. A lest of the hypothesis that a sample of three came from the same normal distribution. 
Carl A Bennett, Genoial Electric Company. 

4. A Note on the application of the abbreviated Doolittle solution to nonorthogonal analysis 
of variance and covariance. (By title.) Carl A Bonnett, General Electric Company, 

The attendance was between 100 and 150 during the afternoon 
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The Wednesday morning session was devoted to a symposium on Design of 
Experiments with Particular Reference to Agncultwal Trials Dean A. R 
Davis of the University of California, Berkeley, presided briefly and then 
Professor Abraham Wald took over the duties of chanman The papers were’ 

1 Rcccnl advances m expenmental design. It C Bose, University of Calcutta. 

2 Yield Inals with backciossed denved lines of wheal G A Baker and F N Briggs, 
University of California at Davis 

3, Selecting subset which includes the largest of a numbei of means Charles Stein, Uni¬ 
versity of California, Berkeley 

4, Discussion A, G Clark, Colorado State College, S W. Nash, Umveisity of Cali¬ 
fornia, Berkeley, J R Vatnsdal, State College of Washington 

5 The effect of inbreeding on height at withers m a herd of Jersey cattle W C. Rollins, 

S W Mead and W. M Regan, University of California at Davis 

Attendance was about 100. 

The afternoon session, under the chairmanship of Professor George Polya of 
Stanford University, began with an invited address by Professor Michel Lokve, 
University of California, Berkeley, on Random Functions and Related Problems 
This was followed by the contributed papers: 

1, An example of a singular continuous distribution (By title ) Henry Schoffd, Uni¬ 
versity of California at Los Angeles. 

2 On the theory of some nonparametne hypotheses E, L Lehmann and Charles Stem, 
University of California, Beikcley. 

3. Compound randomization m the binary system John E. Walsh, Project Rand 
4 A multiple decision pioblem ansing in the analysis of variance Edward Paulson, 
University of Washington. 

6 Recurrence formulae for the moments and semmvanants of the joint distribution of the 
sample mean and vanance Olav Rcierspl, University of Oslo 

6, Identification problem m factor analysis (By title ) Olav Reierspl, University of 
Oslo 

7. On distinct hyjjolhescs. Mrs. Agnes Berger, Columbia University 

The attendance was approximately 100. 

A symposium on Sampling for Industrial Use occupied the Thursday morning 
session. Professor Z. W Birnbaum of the University of Washington presided. 

1 Sampling plans for continuous production. M A Girshick, Pioject Rand. 

2 Sampling plans with continuous variables foi acceptance inspection A L Bowker, 
Stanford University 

3, Place of statistical sampling in the education of engineers E I- Grant, Stanford 
Umveisity 

4. Discussion. Henry Scheffd, Umveisity of California at Los Angeles; Charles Stem, 
University of California, Berkeley; Holbrook Working, Stanford University. 

The attendance was approximately 100. 

The first part of the afternoon session, presided overby Professor W. J. Dixon, 
University of Oregon, was devoted to contributed papers; 

1 Statistical problems of medical diagnosis. Jerzy Neyman, University of California, 
Berkeley. 

Discussion . L. J. Savage, University of Chicago 
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2 Power of certain tests relating to medical diagnosis. C, L. Chiang and J. L. Iiodgoa, 
University of California, Berkeley. 

3. On best asymptotically normal estimates Edward \V Barankin and John Gurland, 
University of California, Berkeley. 

4 Iterative treatment of continuous birth processes. T, E Harris, Project Rand 
6, Estimation of means on the basis of preliminary tests of significance Blair M Bennett, 
University of California, Berkeley. (By title.) 

The attendance was about 90 

The second part of the afternoon session was the Business Meeting. Professor 
Abraham Wald, President of the Institute, presided. It was recommended that 
two meetings a year be held on the West Coast, one in June in the San Francisco 
Bay Area alternating between Berkeley and Stanford and the other during the 
winter alternating between the North and Los Angeles. The next West Coast 
meeting will be held during the Thanksgiving recess at Seattle. 
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TESTING COMPOUND SYMMETRY IN A NORMAL 
MULTIVARIATE DISTRIBUTION 


By David F. Votaw, Jr. 

Yale University 

Summary. In this paper test criteria are developed for testing hypotheses 
of “compound symmetiy” in a normal multivariate population of t variates 
(i t > 3) on basis of samples. A feature common to the twelve hypotheses con¬ 
sidered is that the set of t variates is partitioned into mutually exclusive subsets 
of variates. In regard to the partitioning, the twelve hypotheses can be divided 
into two contrasting but very similar types, and the six in one type can be paired 
off in a natural way with the six in the other type. Three of the hypotheses 
within a given type are associated with the case of a single sample and moreovei 
are simple modifications of one another, the remaining three are direct extensions 
of the first three, respectively, to the case of k samples (fc > 2) The gist of any 
of the hypotheses is indicated in the following statement of one, denoted by 
Hi(mvc): within each subset of variates the means are equal, the variances are equal 
and the covariances are equal and between any two distinct subsets the covariances 
are equal. 

The twelve sample criteria for testing the hypotheses are developed by the 
Neyman-Pearson likelihood-ratio method. The following results are obtained 
for each criterion (assuming that the respective null hypotheses are true) for 
any admissible partition of the < variates into subsets and for any sample size, 
N, for which the criterion’s distribution exists; (i) the exact moments, (ii) an 
identification of the exact distribution as the distribution of a product of inde¬ 
pendent beta variates, (lii) the approximate distribution for large N. Exact 
distributions of the single-sample criteria are given explicitly for special values 
of t and special partitionings. 

Certain psychometric and medical research problems in which hypotheses of 
compound symmetry are relevant are discussed in section 1 Sections 2-6 give 
statements of the hypotheses and an illustration, for Hi(mvc), of the technique 
of obtaining the moments and identifying the distributions. Results for the 
other criteria are given in sections 7-8. Illustrative examples showing appli¬ 
cations of the results are given m section 9. 

1. Introduction. In studying psychological examinations, or other measuring 
devices, one may wish to test whether several forms of an examination may be 
used interchangeably. Consider the case of three forms, and assume that 
scores of individuals on the three forms have a normal 3-variate distribution 
The hypothesis of interchangeability is equivalent to the hypothesis that in the 
normal distribution the means are equal, the variances are equal, and the covari¬ 
ances are equal. When this hypothesis is true, the normal distribution is in- 
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4-J8 

variant over all permutations of the variates and is said to have complete sym¬ 
metry. It is frequently more important, however, not only to test that the forms 
have completely symmetric relations with each other but also that they are inter¬ 
changeable with regard to their relation to some outside criterion measure (e g., 
the criterion might be skill in a given task). Assuming that the scores of in¬ 
dividuals on the thiec forms and the criterion have a normal 4-variate distribu¬ 
tion, the hypothesis of interchangeability is equivalent to the hypothesis of 
equality of means, equality of variances, and equality of covariances among the 
three forms and equality of covariances between forms and criterion. When 
this hypothesis is true, the 4-variatc normal distribution is invariant over all 
permutations of the three variates (associated with forms) among themselves, 
and so the variance-covariance matrix has the following form: 


A 

C 

C 

C 

C 

B 

D 

D 

C 

D 

B 

D 

C 

D 

D 

B 


where the quantity A represents the variance of the criterion measure. A 
normal distribution for which this hypothesis is true is said to have compound 
symmetry (of type I). A more general case of compound symmetry (of type I) 
arises when there are several examinations (no two of which need have the same 
number of forms) and several outside criteria 
The hypothesis of complete symmetry may arise in certain medical-research 
problems. For example, suppose a measurement (c.g., %C0 2 in blood) is made 
at each of three times (say Ti, Ti , Ts) on an experimental animal and assume 
that the three quantities have a normal 3-vanate distribution; one may then be 
interested in testing the hypothesis of complete symmetry on basis of measure¬ 
ments (considered as a landom sample) made on several experimental animals 
More generally, let there be two characteristics, say U and W (e.g., %C0 2 in 
blood and %0 2 in blood), which are both measured at each of two times, T \, 
T>. Let it be assumed that the four quantities—which we represent as UTi, 
UTz, W r I \, Wl \—have a normal 4-variate distribution. One may then be 
interested in testing the hypothesis that the means of the first two variates are 
equal, the means of the second two are equal, and the variance-covariance matrix 
has the form: 


E 

F 

K 

L 

F 

E 

L 

K 

K 

L 

G 

J 

L 

IC 

J 

G 


When this hypothesis is true, the 4-vanate distribution is said to have compound 
symmetry (of type II). A more general case of compound symmetry (of type II) 
arises when there are h characteristics and n times (h, n = 2, 3, ■ • •)• 
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Either of the two types of compound symmetry is a direct extension of complete 
symmetry. Wilks [5] has thoroughly treated the sampling theory of certain 
critena for testing various hypotheses of complete symmetry regarding a normal 
multivariate distribution. 

The problems dealt with in this paper are: (i) to give sample criteria for 
testing hypotheses of compound symmetry regarding a normal multivariate 
distribution, and ( 11 ) to give the moments and identify the distribution of each 
sample criterion when the corresponding hypothesis is true 

The hypotheses are stated m section 2 Certain properties of compound sym¬ 
metric normal distributions are given in section 3 Sections 4, 5, and 6 together 
give the method of deriving each sample criterion and the methods of obtaining 
the criterion's moments and identifying its distribution; the methods are illus¬ 
trated for one of the hypotheses Sections 7-8 give the other criteria and their 
moments together with approximate distributions of the criteria for largo sample 
sizes Exact distributions of some of the criteria are given in section 7g for 
certain special compound symmetries Section 9 contains two illustrative 
examples 

2. Statements of hypotheses. Let n be a normal /-variate population and 
X l (i = 1, ■ • ,t) (t > 3) be the i-th variate in II. Let the set of variates Xi, 
X 2 , ■ ■ • , Xi be partitioned into q mutually exclusive subsets of which, say, 
b subsets contain exactly one variate each and the remaining q — b = h subsets 
(where h > 1) contain n x , n 2 , variates, respectively, where n a > 2 

h 

(a = 1, ■ ■ •, h\ b + 2] n a = /). No generality is lost in assuming that the / 

a=l 

variates are ordered so that the hist b belong to the b subsets containing one 
variate each, the next n x variates belong to the (b + l)-th subset, ■ • • , the last??* 
variates to the g-th subset, where n x < n 2 < • • • < n h . Let (1 ,n x ,n 2 , • •, n h ) 
represent such a partition of the variates X x , ■ • , X t into subsets; when b = 0 
the term l b will be omitted. The notation can be simplified when n x , n 2 , • • , 
n A are not all unequal, e g., (l\ 2, 2) can be written as (l 6 , 2 2 ). 

In the statement of each of the following six hypotheses it is assumed that there 
is a preassigned partition (l 6 , n x , n 2 , ■ ■ • , n A ) of the / variates into q subsets 
(q - b + h). 

(1) H x {vivo ): The hypothesis that within each subset of variates the means 
are equal, the variances are equal, and the covariances are equal and that be¬ 
tween any two distinct subsets of variates the covariances are equal. 

(2) II x (vc): The hypothesis that within each subset of variates the variances 
are equal and the covariances are equal and that between any two distinct sub¬ 
sets of variates the covariances are equal. 

(3) H x {m): The hypothesis that within each subset of variates the means are 
equal, given that H x (vc) is true. 

(4) H k (MVC | mvc ): the hypothesis that k normal /-variate distributions are 
the same given that they all satisfy H\(mvc ) for a particular division of the vari¬ 
ates into subsets (7c > 2). 
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(5) Ih(VC | mvc ): The hypothesis that k normal /-variate distributions have 
the same variance-covariance matrix, given that they all satisfy Hi {mvc) for a 
particular division of the variates into subsets (k > 2), 

(6) | mVC): The hypothesis that h normal /-variate distributions are 
the same, given that they all satisfy Hj(mvc) for a particular division of the 
variates into subsets and that they all have the same variance-covariance matrix 
(7c > 2). 

Any of the hypotheses stated above can be expressed in terms of an invariance 
condition on the noimal /-variate distribution (or distributions), e,g, lli(mvc) 
is equivalent to the hypothesis that the distribution is invariant over all permuta¬ 
tions of the variates within subsets The pattern of symmetry present in the 
variance-covariance matrix of the distribution when any of the above six hypoth¬ 
eses is true is given in section 3 (see (3 2)). 

Six additional hypotheses, H\(mvc), fh (vc), • • • , Hj.(M \ mVC), which are 
modifications of Hiimvc ), ■■■ , H k (M | mVC), respectively, will also be 

considered. In regaid to any of these six 7? hypotheses, it is assumed that there 
is a partition ( n h )(n = 2, 3, • ) of the t variates (i = nh) and that in each subset 

the variates are in a given order; thus each subset has n variates and between 
any two distinct subsets of variates there are ri covariances, which form an n X ft 
“block” in the variance-covariance matrix of the distribution (see (3 4)). The 
hypotheses may now be stated as follows: 

Bi(mvc)‘ The hypothesis that within each subset of variates the means are 
equal, the variances aie equal, and the covariances are equal and that between 
any two distinct subsets of variates the diagonal covariances are equal and the 
off-diagonal covariances are equal. 

Bi{vc)\ The hypothesis that within each subset of variates the variances are 
equal and the covariances are equal and that between any two distinct subsets 
of variates the diagonal covariances are equal and the off-diagonal covariances are 
equal. 

The statement of any of the hypotheses Bi{m), Hi.(MVC \ mvc), H k (VC \ mvc), 
and B h (M \ mVC) is obtained from the statement of the corresponding H 
hypothesis by simply substituting H for H. The pattern of symmetry present 
m the variance-covariance matrix of the distribution when any of the six B 
hypotheses is true is given in section 3 (see (3.4)), from which the appropriate 
invariance condition on the normal distribution can be obtained. 

A test of any of the hypotheses H^mvc), Bi(mvc), Bx(vc), Bi(vc), Hi(m), Bi(m) 
is based on a random sample from a normal multivariate distribution; a test of 
any of the remaining hypotheses is based on k random samples from k normal 
multivariate distributions, respectively, (k > 2) 

A normal distribution for which an H or B hypothesis is true will be called 
compound symmetric. In the special case where the compound symmetry holds 
for a partition (t) of the t variates, any H hypothesis and the H hypothesis 
corresponding to it are identical; in this case the normal distribution will be 
called completely symmetric. Problems (i) and (li) (see section 1) have been 
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solved completely by Wilks [5] for Hi(mvc), Hi(vc), and Hfin) foi the case of 
complete symmetry. 

3. Block symmetric matrices and vectors. Let be the mean value of X, 

and 11 || be the variance-covaiiance matrix of X t , • , X t (i, j = 1, ■ • , t) 

(p,, is the coefficient oi correlation between X, and X 3 ). The joint probability 
density function 1 of Xi, X 2 , • , X t is 

(3 1) f(Xi , X ,, , Xt) = I (?,; | 1/2 ^- f/2 exp [-£ (E(X, - m,)(X, - «,)], 

where || G„ || is positive definite and its inverse || G 1,1 || = || 2 p.jtr.ir s j|. 

When any of the Ii hypotheses is true (sec section 2), we represent || G' 1 || 
by || A'* || (also || G,, || by || A„ || ) which can be written as (3.2) (see page452), 
where A aa ‘ = A a “ (s, s' = 1, • ,b) and XT' = D n ’\a, a! = 1, ■ • • , li, a * a'). 
The A’b and B s with single superscripts and the C’s and D’s have been intro¬ 
duced to indicate the block ‘pattern clearly In general C sa = C as only if 
a = s{s = 1, • • , 5; a = 1, • • , h) . ||A„ || and || A ,J [| have the same 
block pattern of symmetry. 

The blocks in (3 2) are formed by making a partition (l. 6 , n v , n ,, , n*) of the 

i lows and t columns of [| A" || A matrix having the block pattern of sym¬ 
metry of (3.2) will be called block symmetric of type I. Clearly a block symmetric 
matrix of type I is invariant over all permutations of its rows and columns within 
the subsets determined by (l 6 , % , • , n h ), if the row interchanges and column 

interchanges are the same Also, a f-component vector will be called block 
symmetric if the order of values of the components is invaiiant over all permuta¬ 
tions of the components within groups determined by (1 , m , • , n h ). 

The determinant of the block symmetric matrix || A,, || is 

(3.3) | A,; | = K ft (A* - B a ) n ‘-\ 

where 





C'n 

Cl, ■■ 

cL 


A s ,' 


cL 

cL ■ 

cL 

Cu 

Cn ■ 

G[ i 

Ai 

Dl 2 • 

Dl h 

C'n 

C'n 

C[ 2 

Dli 

Al ■ 

D', h 

Cu, 

Cu ■ 

cL 

Dh 

dL 

• Al 


1 In general a chance quantity and the variable of its distribution function will be de¬ 
noted by the same symbol. 
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where C sa = C sa Vn«, A a = A a + (n 0 - l)£ a , and D' aa . = D aa . V n a n a > 

( s = ,a' ,b l^ ,CL ' a =: “ ’ b \ a ^ a ^> ^ ss '> B OJ and D aa » are the cofac- 

tors of -4" , C s “, A a , B a , and D aa ', respectively in (3.2). The ellipsoid, defined by 
— m,)(X, — m,,) = r 0 (r 0 fixed and > 0), has ( n a — 1) axes of equal 
length (o = 1, ■ • , h ), and each of the remaining q axes is inclined to the co- 
oidinate axes so that its direction cosines have the same block symmetry as the 
set of diagonal elements in (3 2). 

When any of the H hypotheses is true, we represent || G 11 || by || A l} || (also 
]| Gu || by || Atj ||) which can be written as 


(3 4) || A” || = 


A 1 

B 1 ■ 

E l 

C 12 

D u ■ 

B 12 


c lh 

D lh 

D lh 

B 1 

A 1 ■ 

B 1 

B 12 

C li ■ 

B 12 


B u 

C lh 

B lh 

B l 

B l • 

■ A 1 

D vi 

D n 

c n 


D ih 

D ih ■ 

■ C lh 

C 21 

B 21 . 

B 21 

A 2 

B 2 • 

• B 2 


e h 

D Vl ■ 

B 2 \ 

1J 1 

e i • 

• B 21 

B 2 

A 2 • 

• B 2 


if h 

c ih ■ 

• Tf h 

B 21 

&■ • 

C n 

B l 

B 2 ■ 

■ A 2 


B 2h 

D ih • 

• c 2h 

■ 

' 



c M 

D hl 

■ D hl 

e h 

D 2h ■ 

. d 2Iv 


A" 

B h • 

.. B h 

B u 

C h 1 

. B ld 

B 2 ' 1 

c ih 

tf h 


B h 

A h ■ 

• • B h 

B hl 

B M • 

■ C u 

D 2h 

D 2h 

. c- h 


B h 

B h ■ 

• - 


where the blocks are formed by a partition (n h ) of the t rows and t columns; thus 
each block is an n X n array || A 11 1[ and || A tJ |j have the same block pattern 
of symmetry. 

A matrix having the block pattern of symmetiy of (3.4) will be called block 
symmetric of type II. The determinant of || A„ || is 

I, | = K^Q, 


(3.5) 
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where 


41 — Hi Cj2 — H/o 
— Du -I2 — /L 


On, — Du t 
£ 2/1 — -OaA 


Ch 1 — H/u ^.2 — S/,2 • • -<4* — Eh I) 

-4l Cli • ‘ ■ Cu , 

&a At ••• CL 


I Cm C/,5 • • A h I, 

where A.' a — A a + (re — 1 )B a and G' aa - = C an - + (re — X)S ao , (a, a' - 
1, 2, ■ , h\ a a'), A a , E a , G a a' , and S 0 „. arc the cofactors of A" , E", C" ,l> , 

S’"", respectively, m (3.4). 


4. Method of obtaining the sample criteria. The probability distribution, 
P, of a simple, random sample, say Oy(Xi a , X« a , • , X, K )(a = 1, 2, • • • , N), 

from II is 

(4.1) P = G„ |* /2 exp [-£ G„(: X„ - m t )(X ia - m,)\. 

».J.a 

For 0^ fixed, P is the likelihood function of the parameters to, , ?/i 2 , • , m t , 

and G,/ ( 5 , j = 1, 2, ■ • , i). To obtain sample criteria for testing the Ii and R 
hypotheses we shall employ the Neyman-Pearson likelihood-ratio method. The 
details of applying this method will be given for only one of the hypotheses, since 
the technique of application is the same for all the hypotheses under 
consideration. 

In applying the likelihood-ratio method we maximize P under two different 
sets of conditions and form the ratio of the two maxima To derive a criterion 
for, say, Hiirnvc), we first maximize P over the set, £2, of admissible values of the 
parameters m (4 1), secondly, we maximize P over the set, w, of admissible values 
of the parameters in (4.1) that satisfy Hi(mvc) Let P a and P u be these maxima, 
respectively. The likelihood-ratio criterion for Hi(mvc) is \i(mvc) = P u /P n; 
thus 0 < 'hi(mic) < 1 The sample criterion, L\(mvc), for fh(mvc) will bo chosen 
as a single-valued function of Xi(mrc) 


4a. Derivation of the criterion L\{mvc). The parameter spaces, 42, and, os, can 
be specified as follows: 

j( 1) 11 Gi t 11 positive definite, 

1(2) - » < m, < + co (i = 1, 2, • • • , t), 

(1) |[ri tJ || positive definite and block symmetric (of type I); 

( 2 ) — 00 < m v < + 00, (mi , m 2 , • • , m,) block symmetric. 


CO 
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The block symmetries in w(l) and u(2) are for the same partition (l\ ni, ■ ■ ■ , m) 
of the t variates (see sections 2 and 3). 

Maximizing P is equivalent to maximizing 


(4.2) 


L = In P = -(M/2) In v + (N/ 2) In | G„ \ 


Solving the simultaneous equations 3 L/dnu = 0(t = 1, ■ ■ ■ , t) and dL/dG t , - 
0 (i, j = 1, ■■■, t; i < j) for m and (2 1J , we have 


(4 3) 


m, = (1/JV) E X,« = X„ 

a=l 

(N / 2) G 11 = E (Z„ - XXX, a - X,) = »„ , 

a sal 


substituting these values of the parameters into (4.1) we find that 
(4 4) Pa = iT m, \2/N) m | «,„ | ~ m exp [-Nt/ 2], 

In (4.3) and (4 5) each expression at the extreme right is defined by the corre¬ 
sponding expressions at the left 

In w(2) there are b + h groups of means, the means within a group being all 
equal; let m, be the s-th mean and m, a be the common value of the means m the 
(I a)-th group. Solving the simultaneous equations 3 L/dm s = 0, 
dL/dm'ra = 0, 3L/3A sa , = 0, dL/bC m - 0, dL/5A a = 0, dL/dB a = 0, 
dL/dD aa . = 0(a, s' = l, ■■ ,b,a,a' = 1, , fc; a * a'), we find that 

, , K S 1 ~ ' 

( ' ih' ra = (1 /Nn a ) E X„ a = x' u , 

a,i a 

{NJ2)A. 3 ’ — E (Xia ~~ X,)(X 3 'a — X s >) = V s s' , 

(N/2)C‘ a = (l/n„) E (X.« - X s )(Xr a - X'rJ = w'.a, 

a i a 

(. N/2)A a = (1 /n a ) E (X- - x r j = Va, 

a ,v 0 

( N/2)B a = [l/UaXa. ~ 1)] E (X, a a — X ra )(X u * — X r J = , 

Jo- 

(,N/2)D ml> = (1 /n a n a ’) E (X,„a — X T J{X la - a — X r J = z aa ’, 

where i a > ja = b H a 1, ? & J ^ ^ Ja I ^ ~ i 

ni = 0; a, a' = 1, • ■ , h) a ^ a'. 

When Hi(moc) is true, the maximum likelihood estimates of m,, <r,, and 
j as i j ... 3 f) would be obtained by means of (4.5) and the definition of 

|| A v ' || given just after (3 1). 

Substituting the expressions in (4.5) into (4.1) we find that 

(4 6) p„ = *- m,i i r w/2 m) m exp [-mm, 

where 
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From (4.4) and (4.6) it follows that the likelihood-ratio criterion for Hi(mvc) is - 
( 4 8) Xi(m#e) = [ I I / | v[, \ ] vn , (i,j = 1, ,<) 

Finally, as the sample criterion for Hi(mvc) we choose 
(4 9) Li(mvc) = |Xi (mvc)f lll) = [ | v t] | / | v[, \ ] 


4 b. Preliminary calculations for evaluation of moments of L\(mvc) The deter¬ 
minant | v[ } | in (4.9) is block symmetric From (3.2), (3 3), and (4 9) it follows 
that ■ 


(4.10) 

where 


Li(mvc) - 


n u - w:r< n ‘~ i) 

.a"! 


i " ri 
\v T r' ' y 


// 

Vsb' ~ ] 

VaT a = 'U'aa Tia > 

v" a r a = K + (n a - l)wia; 
Vr a r a’ ’\/n a n a i Zaa’ , 


(s, s' = 1, • ■ , b; r, r' = 1, • • , b + 7i; r a = b + a; a = 1, ■ • , h) 

N 

Let F ia = X ta - to, and F, = (1 /N) E Y ta> (i = 1, ■ • ,0 Clearly 

«=1 

N f f 

Vi, = E (Yu — Yf)(Y ]a — Yj). When Ih(mvc) is true, uL ,v' ai w' a , and z ( n >, 

m L\(mvc), can be expressed exactly as they are expressed in (4 5) with Y sub¬ 
stituted for X, and (v' a — w' a ) and v" r ’ in (4 10) can be expressed as follows: 

Va - W a = (1 /nf) {E^.. “ [l/(tt 0 - 1)] E 

l a9^1a 

+ (N/n a ) E Y\ - [N/n a (n a - 1)1 E Y la Y ]a ; 

i a la^la 

It 

Vg 8 f , 

v!r a = (l /Vn a ) Eb> a , 

*0 

v r„ra = (l/™«) E ^lo!a J 

»a.Io 

Cra' = (l/V»a«a') E 

tai?a ^ 

From (4 10) and (4.11) it follows that when Hi(mvc) is true, each element of the 
determinants on which Li(mvc) depends consists of: (a) a quadratic form in Y, 
and a linear function of the v tl , or (b) merely a linear function of the 

Vij (b J = j ^) - 

The j oint probability density function of the v tJ and Y , is 
(4.12) fMg(Yi , , Yt), 
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where 

I rl |(JV-l)/2 | |(tf-f-2)/2 r V 1 n 

I I I | GXp l 2—/ GijV ijj 

/( '" ) ("Ai) r ) . . r ("Hi)’ 

(I | Qx, 11 positive definite; N > f), which is the Wishart distribution [9, p. 120], and 

g(? 1 , • • ■ ,n = | <?., | 1/2 W"V‘ /2 exp I-AT £ <?., y,- 7,1 = g(Y), say, 

* 1 

which is a normal f-vanate distribution. The rf-th moment (d — 0, 1, ■ • ) 
of Li(mvc), when H\(mvc ) is true, is 


(4.13) 


ElUmvc)}* = f f(v t] )g(Y) |».,|Vrr‘ 

Jr 


■n u - (rid^n^, 


where the domain, R, of integration is — °° < 7, < + °o 11 v^, \ | positive semi- 
definite (i, j = 1, ■ • , t). The integral in (4.13) is evaluated in section G (by 
means of Wilks’ moment-generating operators) for the case where H^mvc) 
is true. 


5. Remarks on Wilks’ Moment-Generating Operators. Wilks’ operators 
are applicable to a far wider class of problems than those treated in this paper 
The following discussion is confined to a special use of the operators. 

Prom (4.12) it follows that 


(5 1) 


/„ 


I v,j | (iV ‘ 2)/2 exp [- £ G„v „1 XX dv„ 

_hi_ »a i 

n r[(7V - 0/2] 

l-l 


G v 


I—(JV—D/2 


where R' is the region in the space of v tJ for which |] v,, 11 is positive definite, and 
II G t] || is positive semi-definite (Of course, the probability that || v,j || is not 
positive definite is 0.)^ Let G‘i, = (?,, + 0„(», j = 1, •••,<); if all the /3 VJ are 
sufficiently small, || Gi, || is positive definite, and we have 

I »ii exp [- 5Z G„ v„] II dv tj 

| Q 1 j(JV—1)/2 / __hi_f*2»_ 

(5-2) ” Jh ' -q r[(jv _ i)/2] 

1-1 

_ I p I (JST—1)/2 | pt I'—(iV—1) /2 

winch is Big), where g = exp [-J2 $,&,]. 

1 l,J 

Let Iij be an operator (whose operand is a function of all the /3„) which repre¬ 
sents the following set of operations: (a) replacement of each /?,/ in the operand 
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by 5.1 + £,fi, (b) integration (of the lesult of (a)) with respect to = 1, ■ ■ , J) 
from — oo to + * , (c) multiplication of the result of (a) and (b) by From 

(3.1) it follows that 

(5.3) = I\, (exp [- X) 0.,«>.J) = g | v ti r 1/2 , (|| v„ || pos. def.); 


and if all the 0„ are set equal to 0 after performing the /-operations, then <7 — 1 
and (5.3) yields | ?;„■ | -1/2 . Let /(, be X [repetitions (X = 1, 2, • • • ) of /],■. 
Clearly, 


(5.4) 


Id,,-. = E[g\ Vl] n | PlI .o = E[\v t] r /2 ]. 


Under all conditions of their use in this paper the I operations are interchangeable 
with the E operation [8; p. 316]; thus, 

wig] = ll[E(g)} 


From (5 2), (5.4), and [8, pp. 318-320] we have 


e[ i r x/a ] = i <?.. 


|(X-l)/2 


m i <?:, 


' l—(V—l)/2i 


k,-o 


(5 5) 


= I G, 


|X/2 


Tim - i, -m, 


where N > t + X + 1 and i p(R, S) = T j V (^j . 

The operator /„ may be used, as indicated above, to find negative half-integer 
moments of | | To obtain positive half-integer moments of | | we may 

use an inverse operator 77* [8, pp. 321-323] (X = 1, 2, • ) which has been 

defined in such a way that 


(5.6) 


| <?„ r i)/z 




G> i-C^D/i, 


|P.3=° 


E[\ | X/2 J 

= i G tl r /2 (n m -1, x]). 


The equality between the second and third expressions in (5.6) can be obtained 
from (5.1) by replacing N by N + X (see [7]). 

In (5.5) and (5.6) the 0’s are not necessary; however, in (4 13) and in similar 
expressions for the moments of the other criteria there are several determinants, 
each determinant requires a distinct /-operator, and it is of great convenience to 
introduce a distinct set of 0’s for each I operator. The 0’s associated with a 
given operator may initially appear m more than one of the determinants m the 
operand. The order in which several /-operators are used is illustrated in the 
following case for two: 

(5.7) ui i <?:,• \~ k, (w i gc, r") ifcj-oi k,=o. 

where A, p > 0 and the values of ¥ and k" are such that the value of the expres¬ 
sion is well defined. The notation in (5 7) means that Z,, p is applied to | G ,, | , 

the 0’s associated with /7, p are set equal to zero, then l\, is applied to the product 
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of | G[j \~ l> and the results of the previous operations, and then the p’s associated 
with 7i, are set equal to zero. The interchangeability of the order of 7 opera¬ 
tions is discussed in [8, p. 324], 


6. The moments and distribution of L\ (rnvc ) when Hi (mvc ) is true. To 
evaluate (4.13) we let 

(6.1) g = exp [- £ /M., - Z p' a (v' a - ia') - £ v ' r ,], 

*.j a nr' 

From (4.11) and (4.12) we have 

(6 2 ) e( 0 ) = i a u r1 a:, 1 a 1 :, r , 

where 


a’,,, 

= 

A ai i 

+ 

Pas' + P'L, 


dll, 

= 

c„ 

+ 

P»i a "j" P&rJ'\/W‘c.) 


^laia 

- 

A a 

+ 

Plata H“ fia/fta PraTa/flo. , 


Xf. 

= 

B a 

+ 

Piaja Pa/fya l)a a -f" Pr a ra/^a 


= 

Daa‘ 

■ + 

Plata* Pr a T a '/) 

(a 9 ^ 

/las' 

= 

A aa i 

I 



*tt 

= 

c m: 

1 



A H 

At a i a 

= 

A a 

+ 

Pa/lla ) 


jl *n7o 

= 

B a 

~ 

Pa/llafya. l), {l a 7^ Ja), 


A" , 

‘“■'ala' 

= 

D aa 

’ 7 

(a ^ a'). 



When Hi(mvc) is true, we have 


E[Li(mvc)} d = | A tl 


2ST/2 


\a':,\[jgui?\au 

«=i 


-(W-l/2)', 



(6 3) 


= {iW - i, 2d)||n^(W + 2d - r, - 2i)j 
X {Um + 2 d)(n a - 1), - 2 d(n a - l)]||ll(n a - 


(d - 0, 1, 2, ■ • • ; N > t), 

where q = b + h and S') is defined in (5 5), In (6.3) the assumption that 
Hi(mvc ) is true implies that after we apply 77, 2d and set the p i} equal to 0 all 
remaining determinants are block symmetric; we may then use (3 3) before, 
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applying if- and /^ (n ° 1) ; (a = 1, • ■ , h) The expression in (0 3) may be 
■written as follows 1 


(6.4) 


E[Lt(mvc)Y 
'N - i 


t r 

= |.S+« ~ „(N - i 


+ d 


> 


p (N(na - 1) 

n_i_ i _ 

~ r (*^+d<*-u 


(n (n B - 


h na — 1 

=nn 

0“>1 fi a ==l 


f (N_ - q - Sa - n a + a - 1 ^ ] 

( S ;- ij 

7T + 


2 (n. - l)/j 


where n a is defined in (4 5) and (T)^ = Y(T + d)/T(T). 

We now consider the problem of identifying from (6 4) the distribution of 
Li(mvc) (when Iliimvc) is true). Let 0 be a beta variate, i.e , a variate whose 
c.df, F(6), is 


(6.5) F(6) = Io(P, Q), (O<0<1;P,Q>O), 

which is the Incomplete Beta Function ratio. Ij(P, Q ) is tabulated in [1] 
and [3]. The d-th moment of 9 is: 


( 6 . 6 ) 


v, fn * r (p + d) r(P + Q) 

w r(P) T{P + Q + d) 


(P)i/(P + Qh, 


(d = 0, 1, • • • ). Let 

o 

(6.7) T = He, (c = 1,2, •), 

i-i 

where the 6,(j = 1, ■ • , c) are mutually independent and each 6, is a beta 
variate, having parameters p,, q,, say The d-th moment of r is 


( 6 . 8 ) 


P(r) d = f[ (p<)i/(p, + q,)d , (d = 0, 1, • ) 

i-i 


Given a variate, say m (0 < p < l)j whose d-th moment (d — 0, 1, ■ ) is given 
by (6.8) we can infer by means of the solution of the Hausdorff problem of mo¬ 
ments that m and r have the same exact probability distribution function (see 
Corollary 1.1 [2, p. 11]). It should be noted that (6 4) can be written as 

h Ta~l 

®[L 1 (»wc)]‘ i = nn [(.Pas^d/ (p as a + Qaifd] j 

fl=l «a=l 


(6.9) 
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where 


pa = t(iV - q - s„ - 72 . + a - l)/2] > 0, 




(s<. — 1) i Q + s - 


L (B- -1) 


+ 


_+ n a — 
' 2 ' 


+ 


f 


> 0 ; 


thus (6.4) is a special case of (6.8). 

The exact probability (density) function, say g(r), of r has been obtained by 
Wilks [7, p. 475] and is: 

g ( r) - - t )^- 1 f 1 - ■ fvr'vr 1 • ifc i_1 

Jo Jo 

X (1 - yi ) ! -‘-’—-'(l - . . (1 - t; c _ 1 ) r ‘-«'- 1 


(6.10) X U - Bi(l - r)] Pl ~ Pl - Q1 [1 - fa + «,(! - Vi) J (1 - ■ ■ • 


X [1 ~ fa + Vl(l — Vi) + ‘ + y c _i(l — «l)(l — v») • • (1 — fe-a)} 


0 —1 

Xll^n 



(1 - tT 


tj = L (?W + 

J'-O 


V] - Yl pa-]'- An approximation of the distribution of a product of inde- 

J'—O 

pendent beta variates by the distribution of a single beta variate is given in [4], 
The results of this section may be summarized as follows: If Ili(mvc) is true, 
the d-lh moment (d = 0, ],•■■) of the exact distribution of Li(mvc) is given by 
(6.4). Also, if Hfmvc) is true, the exact distribution of L\(mvc) is given by 
(6.10), where the p, ,qj, and c can be specified by means of (6.4). The cumula¬ 
tive distribution of Li{mvc) is given for certain special cases in section 7g. 


7. Single Sample Criteria. The solutions of problems (i) and (ii) (see section 
1) for Hfmvc) are contained in (4,9) and the summary at the end of section 6. 
In the present section solutions of problems (i) and (ii) are given for each of the 
remaining two Hi hypotheses and the three Hi hypotheses (all of which are stated 
in section 2). For any of the hypotheses the sample criterion is chosen as a 
single-valued function of the likelihood-ratio criterion for the hypothesis. The 
methods of determining the moments and identifying the distribution of each 
sample criterion (when the corresponding null hypothesis is true) are entirely 
similar to those used in sections 4, 5, and 6 in regard to Hi(mvc). Section 7g 
gives the exact distributions of the single-sample criteria for certain special 
compound symmetries. 

Each criterion discussed in this section is based on a sample 
0tf(Z la Xi a )(a = 1, ■ • ■ , N, N > t) 



COMPOUND SYMMETRY 


463 


of size N from a normal f-variate distribution (f = 3, 4, ■ ). As in the case of 
Hi(mvc ), it is presupposed for testing ITi(vc) or lh(m) that there is a certain 
partition (l 1 ', m , ?i 2 , ■ • ■ , ru) of the 1-variates, for testing Hi(mvc), Hi(vc), 
or Hi(m) it is presupposed that there is a certain partition {n) of the t variates 
(see sections 2 and 3). 


7a. The test Li(vc) for the hypothesis H\{vc). For the sample criterion for 
Hi(vc) we choose 

(7 1) L\(vc) = [Xi(rc)] 2/W = \v X] \/ \v„ [, (i, ] = 1, • ■ , t) 


where Ai(rc) is the likelihood-ratio criterion for Hfvc), is defined in (4.3), and 


V gs f — Vaa r j 

(t/'fta) jLl ^sjc y 

la 

®»ai a = (l/^n) v lalni 
U 

V'ala = [1 /n<t{ n a l)] X/ Vl'aik I 

^aT^la 

ViaJa' = (1 /'^a'^a') S y lala• ) 

(s, s* — 1, • , hj n, “ 1, ■ * * , hj a n J f 0 j , Ja , Ja “ li "h fla A 1, ‘ J 
b + A,+i , n a = ni + • • + rc„_i; 7 I 1 . = 0). Since || S M || is a block symmetric 
matrix, there is an expression for | k, j that is entirely similar in form to the 
expression in (3.3) for | A,, | (see also (4.9) and (4.10)). 

If Hi(vc) is true, 


E[Uvc)) d = lUt(N- i, 2d) 


(7 2) 


- 1 + 2d){n a - 1), -2 d(n a - 1)] 
X {H +[N - r + 2d, — 2d]\ tf[ (»« ~ D*"*^ 


r/iV — q — Sq — n a + a - l \ 

nr "tt J \_ .. 1 ———M, (d = o, !,••■)» 


= nn 


1 ( N ~ 1 1 ( s ° ~ ^ 

I V 2 ^ (n a - T)Jd 


where q = b + h and ^(B, S), n a and (T)s are defined in (5 5), (4.5), and (6.4), 
respectively. From (7.2) and the argument given after (6 8) it follows that 
if Hi(vc) is true, the exact distribution of Li(rc) is given by (6 10), where the 
p,, g,, and c can be specified by means of (7.2). 
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7b. The test Li(m) for the hypothesis Th(rn). For the sample criterion for 
Hi(m) we choose 


(7 3) 


Li(m) = [X!(m)] z/W 



(hi = I, ■■,[), 


where Xi(m) is the likelihood-ratio criterion for 7/i(m) and and v tJ are defined 
in (4.7) and (7.1), respectively. In passing we note that 


(7.4) 


[Li(m)][Li(vc)] = Li(mvc). 


If Hi(m) is true, 

E[Li(m)]' 1 = fl {m - 1 )(« a - 1 ), 2a(n a - 1 )] 

a=i 


(7.5) 


X 'P[(n a — 1)(JV + 2a), — 2 a(n a — 1)]} 

- I s 


-nnM,- 4^ 


77 — 1 _|_ Sa 


n=l s a ««l 


(E + 

. \ 2 n a 1 /d 


(d = 0 , 1 , ••). 


If tfi(m) is true, the exact distribution of Li(m) is given by (6.10), where the 
p, , q, and c can be specified by means of (7,5). It follows from (7 5) that the 
exact distribution of Li(m), when Ih(m) is true, does not depend on b 


7c_ The lest Li(mvc ) /or the hypothesis Hi(vwc). The sample criterion, L^mvc), 
for Hi(mvc) (see section 2 ) is 

(7.6) Li(mvc) = [X 1 (mrc)] 2/ff = \v„\/ \v[, |, (i, j = 1, • • , t) 

where Xi(towc) is the likelihood-ratio criterion for Hi(mvc), is defined in (4.3), 
and 


= (1/n) E (X laa - X' a )\ 

a,2 a 


W *a)o 

= [l/n(n 

- 1)1 E (X,> - r a )(x iia - x' a , 

>, 


(ia 

^ Ja); 



l a^la 





_/ 

V i^a' 

= (1/n) 

E, (x ua - x' a )(x K , a - x' a ,), 






a 

•nVa' 







(C 

Ja 4" 

n(a' - 

- a); a 

X a'); 

_/ 

V iaK' 

= [1 /n(n 

- D] E, (x laa - x' a )(x K , tt - ; 

Cl, j1'a* 

X'i) 






oc 

^ Ja + 

n(a - 

- a'), a 

^ o'); 

(a = 

1j ' •" , h, 

ia , Ja , ha , ka = (d — I )n + 1 , • 1 • 

, an-, h a 

' = 'la 

+ n(a' 

- a). 


K' i a 4 - n (a! — a); a. = 1, ■ • ■ , N). || v %J || is a block symmetric matrix, 
of type II (see (3 4)), in which the blocks are formed by a partition in') (t = nh) 
of the rows and columns; there is an expression for | tj, | that is entirely similar 
in form to the expression m (3 5) for | Ay \ 
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If Hi(mvc) is true, 


(7.7) 


E[Umvc)\ d = (» - 1 ) hd ^ j _ t> 2rf )J 

X + 2d)(« - 1) + 1 - a , -2 d(n 


h n—1 

= nn 

a=l **“1 


- /i - s - (n - l)( a - l)^ 


(f 


+ - 1 ~ 0 
2 (n - 1) + 




(d - 0 , 1 , ■••). 

If Bi(mvc ) is true, the exact distribution of Li(mrc) is given by (6.10), where 
tllG Jb > 9] and c can be specified by means of (7 7) 


7d. The test L x {vc) for the hypothesis Bi(vc). The sample criterion, Ldvc) for 
jni(wc) (see section 2) is 

(7 ‘ 8) Z ^ c ) = [Uvc)f lN = I v h I / I S„ I ( t> 3 = 1, .. • , 4 ) ; 

where Ai(uc) is the likelihood-ratio criterion for B^vc), v t} is defined in (4.3), and 

*... ~ (1/^) *b„Ja 1 
u 

K,a = [l/n(n- 1)] E ^ (i , (l.^.), 

~ (1/^) £ y ;,4' i (fca' = jo + n(a' — a)-, a ^ o'), 

ja-X’o' 7 

= [l/n(n — 1)] v ia y a ,, (ftj, 7^ j a + n(a' - n); # ^ a'), 

vheie the langes of a, i 0 , j a , h a , k a are given in (7.6) There is an expression 
f°rj| | which is entirely similar in form to the expression in (3 5) for | A v I 
If ffi(vc) is true, 


- ( 
n 


(7 9) 


£[L,(uc)]' i = (n - | II i(N - i, 2d) 

-A+1 

li 

r 

a==l 


X < £ W - 1 + 2d)(n - 1) + 1 - a, -2d (n - 1)] 


n—1 

= nn 

a™I 3^1 


[( 


N - h — s — (n - l)(a - 1) 


), 


fN — 1 . l — o s — l\ 
\ 2 * 2(n - 1) + 


id = 0, 1, ■••). 


If Hi(vc) is true, the exact distribution of Li(vc) is given by (6.10), where the 
Pi > 9i and c can be specified by means of (7 9). 
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7e- The test Li(m) for the hypothesis H x {m). The sample criterion 
for (see section 2) is 


(7 10) 


Urn) = Mm)] m 


Li(mwc) 

Li(vc) 



where Xi(m) is the likelihood-ratio criterion for fh(m) and || v ( j |[ and || tq, j| are 
given in (7.8) and (7.6), respectively. 

If Bi(m) is true, the d-th moment (d = 0, 1, • ■ •) of Lfm) is 


(7.11) 


eumy = n n 







. 1 - a s - A ’ 

2 (n — 1) n — 1/d 
1 — a . s — l\ 
2(n-1) + ~\)i 


i 


(d = 0,1, •••) 

If 7?i(m) is true, the exact distribution of Lj(m) is given by (6.10) where the 
p ,, q, and c can be specified by means of (7,11). 


7/ Relations among Li(mvc), L x (vc), and L x (m) ami among L x (mvc), L x (vc), 
and Lfm). Li(mvc) is the product of Li(vc) and L x (w) (see (7.4)); moreover, 
when Hi{mvc) is true, the d-th moment (d = 0, 1, • • • ) of L\{mvc) equals the 
product of the d-th moments of Li(vc) and Li(ni) (see (6 4), (7.2), and (7.5)) 
'From this result and the argument given after (6.8) it follows that when H x {vwc) 
is true, L x (mvc ) is the product of two independent chance quantities, namely, 
Li(vc ) and L x (m). Similarly, when Bi(mvc) is true, L x (mvc) is the product of 
two independent chance quantities, namely, Li(vc) and L x (m). 


7 g. Exact distributions of single sample criteria in special cases. For a sample 
of size N and a partition (1", n x , ■■ • , mi) of the t variates of II (see section 2) 
let the cumulative distribution function (c d.f) of L x (mvc), when H x (mvc) is 
true, be 

(7.12) F(u \l b ,n x , ■ • • , n h | N) = Prob {Li(mrc) < u }; 

also, let F{y [ l 1 , n x , • • ■ , n>, | N) and F(z | l 6 , m , ■ , n h \ N) be the c d f.’s 
of L x (vc) and L x (rn) when H x (vc) and Hi(m) are true, respectivelyLet 
F(u | n | N), F(y \ n h \ N), and F{z | n h | N) be the c.d.f.’s of L x (mvc), L x (vc), 
and Zi(m) when Tl x {mvc), H x (vc ) and J?i(m) are true, respectively. 

It can be shown that 

F(u\l\ 2 | N) = L[(N - b - 2)/2, ( b + 2)/2], 

F{u i l b , 3 | N) =I v -[N-b -3,5 + 3], 

E(y 11 6 , 2 IJV) = u [(N-b- 2)/2, (b + l)/2], 
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(7 13) Hy 1 3 {N) = 7 Vi;^ - h - 3 - b + 2], 

I A n I N) = I,- [(N - l)(n - l)/2, {n - l)/2], [s' = z^\ 

F{u | 2 s [ AT) = / v - [N - 4, 3], 

1 2 2 | AO = 7 V = [IV - 4, 2], 

F(z I u 2 1IV) = h. [(N - l)(n — 1) — 1, ft — 1], [z 1 = * IW - ,) ] I 

where I X (P, Q ) is defined in (6,5). 

Distributions of the criteria in certain cases where the normal distribution is 
completely symmetric (see section 2) are given in [5]. 

7 h Asymptotic distributions oj the single sample criteria When the sample 
size, N , is large, we may use a theorem [6] (see also [9, pp. 151-2]) concerning 
the approximate distribution of the likelihood-ratio criterion. For large N the 
distributions of the quantities —N In Li(mvc), -N In Li(vc), and -N In L x {m) 
(when Hi(mvc), H\{vc), and iZi(m), respectively, arc true) are approximately 
chi-square distributions with (1/2) [t(t + 3) — b(b + 3) - h(h + 5)] - hb, 
(1/2 )[t(t + 1) — b{b + 1) - h{h + 3)] — hb, and t — b — h degrees of free¬ 
dom, respectively. Also, for large N the _distnbutions of the quantities 
—N In Li(mvc), —N In Li(vc), and — N In L x (m) (when flx(mvc), Hiivc), and 
Fh(m), respectively, are true) arc approximately chi-square distributions with 
[t(t + 3)/2 — h(h + 2)], [t(t -f- l)/2 — h(h + 1)], and l — h degrees of freedom, 
respectively. 

8. /c-Sample Criteria. In this section solutions of problems (i) and (ii) (see 
section 1) are given for the three Hi and the three Hi hypotheses (all stated 
in section 2). 

A test of any of these hypotheses is based on k simple, random samples (1c > 2) 
from k compound-symmetric, normal i-variate distributions. The probability 
density function, Q, of the k samples, say, 0 Kf (p = 1, • • • , 7; N T > b -)- h) is 

(8D rfn^r^ 

L p 1 

X GXp [‘ G lh p{Xia p 

hhP'Gp 

If 

(N 1 = X) AT„ ; I, j = 1, ■ • ■ , f), where X, ap is the a„-th sample value of the 

jj->i 

i-th variate in the p-th population (a„ = 1, ■ • • , N P ), m,, v is the mean (expected 
value) of the i-th variate in the p-th population, and (1/2) || G„, P ]| 1 is the 
variance-covariance matrix of the variates in the p-th population (see (3 1)). 
For a given set of k samples Q is the likelihood function of the parameters 
G\j,v and m, tp (i, j 1, , t, p 1, , ^0 ■ 
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The six hypotheses under consideration (see section 2) can be restated m terms 
of G t/ , v and m hP , e.g , H k (MVC \ mvc) asserts that im, 1 = m , i2 = ■ = m,,*, 

and || Gu,i || = || 0 ^, 2 1 | = ■■ = || G^,k || given that for all p the vector 
(mi.p , • • • , mi, v ) is block symmetric and the matrix || || is block symmetric 

(of type I) for a preassigned partition (l 11 , n 1 , • • • , >h ) of the t variates (see 
sections 2 and 3) 


8a Expressions for the criteria. Let X« .(MVC | mvc), ■ ■ ■ , X*(Af | mvc ) repre¬ 
sent the likelihood-ratio criteria for the six hypotheses Hk(MVC | mvc), ■ • • , 
H k (M | mVC) respectively, and let L k (MVC \ mvc), • , L k {M \ mVC) be the 

sample criteria for the respective hypotheses. We choose the L\ as follows: 

L k {MVC | mvc) = MMVC | mvc)} 2 , 

L k (VC | mvc) = [X*(j¥C | mvc)} 2 , 

(8 2 ) L,IM | mVC ) = [\ k (M \ mV®}' 1 , 

lL k (MVC | mvc)\ llNI _ 

\ Lk(VC | mvc) j ’ 


the expressions for L l (MVC | mvc), L k {VC \ mvc), and L k (M \ mVC) are the same 
as those in (8,2) with X fi replaced by . The Xjt and \ k can be obtained explicitly 
by straightforward application of the likelihood-ratio method (see the paragraph 
preceding section 4a). 


8b Moments of the h-sample criteria The exact distribution of any of the 
/c-sample criteria, when the corresponding null hypothesis is true, is given in 
( 6 . 10 ), where the quantities p, ,qj, and c can be specified by means of the moment 
expressions given below. The moments have been obtained by means of the 
operators discussed in Section 5. 

For each of the following six moment expressions the null hypothesis, cor¬ 
responding to the sample criterion involved, is assumed to be true: 
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X 


f nrra 

p=l a=l \ l 


+ 


(< - 1 ) y 

■AT 33 (jla 1) / d ^ > 


n y/i 1 (tf-D \ 
l ,4 4±i \2 T #'(n„ - 1) /j J 

AT — 7c -|- 1 — 4 


E[Li(AT | m7C)] d = f[ 


E[L t (M7C I mvc)] d 


N' - r\ 

2 ) d 

_Jii«=i4-i\2 2N V ' i\T„ A 

S JST # / 1 -t 

rrnft - — + u — 

„=1 tk \2 2N' ' N> 


E[Lk(VC | mi»c)] d = 


f ft ft"tf ft + ^ 1 

iia -1 ui±i \2 T 2N„(n - 1) ^ N P (n - 1 )/j 

ft "ff 0 ft~ + 1 -a . (* ~ 1 ) \ ’ 

l M 44 \2 ^ 2N'(n - 1) " r jV'(» - lj/d J 

ft ft ft (- - — + ~- 1) 

ii 44 \2 2N P N, 


nn (l - (t + I} + ( " ~ ]) 


a=>l u** 1 


2 N' 


N' 


rftft^Vf 11 /1 , i-a , (u;- 1 )^ 1 

fti a=I 4J4 \2 " t " 2jV p (n - 1) T iy„(n - 1) A 1,. 

ft"ff lT ft 1 1 ~ a I (W, ~ 1)N 1 ’ 

l ii 44 \2 + 2N'(n - 1) ^ N'(n - 1 )ft J 


£[A(M|m7C)] t ' = ft 

a=l 


AT' - Jb + 1 - o 


1 m 

where d = 0, 1, ■ ■ • and (T)d is defined 111 (6.4). 


8c. Comments on the criteria. By an argument similar to that used in section 7f 
it follows from (8 3) that when H k (MVC \mvc) is true L k (MVC | mvc) is 
the product of two independently distributed chance quantities, namely, 
L k (VC | mvc) and [LiJM \ mVC)) N '. The same assertion holds true if we re¬ 
place each L by L and H by H. 

Exact distributions of the /c-sample criteria, when the corresponding null 
hypotheses are true, can be obtained explicitly for special values of k and special 
compound symmetries, but owing to lack of space we shall not consider them 
in this paper. 
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When the sample size N' is large, the exact distributions of 

—In L k (MVC | mvc), -In L k {VC | mvc), —N'hx L k (M | mVC), 

-In L k (MVC | mvc), -In L k {VC | mvc), 

and —N' In L h (M | mVC) (if the corresponding null hypotheses, respectively, 
are true) are approximately chi-square distributions with 


(*-l) 


b l+l> + hh + w+ill, 

2 Z J 

(k - 1 )[b(b + l)/2 + hb + h(h + 3)/2], 


q(k - 1), h(h + 2)(7c - 1), h(h + l)(ft — 1), and h(k — 1) degrees of freedom, 
respectively, 


9. Illustrative examples. The first of the following two examples 2 illustrates 
the use of L^mvc), Li(vc), and Li(m) in a psychometrics experiment, the second 
example illustrates the use of Li(mvc), L^vc), and Li(m) in a medical-research 
experiment (see section 1). 

Example 1. In an experiment to establish methods of obtaining reader 
reliability in regard to essay scoring, 126 examinees were given a three-part 
English Composition examination. Each part required that the examinee write 
an essay, and for each examinee four scores were obtained on the following four 
things, respectively: (1) the part-2 and part-3 essays together, (2) the original 
part-l essay, (3) a long-hand copy of the part-1 essay, (4) a carbon copy of the 
long-hand copy in (3). Scores were assigned by a group of “English Readers" 
using procedures designed to counterbalance certain experimental conditions 
The score on (1) serves as a criterion. The experimenter asks whether on the 
basis of the sample (of size 126) the quantities associated with (2), (3), and (4) 
can be considered as interchangeable among themselves and interchangeable 
with respect to their relation to the criterion (1). 

Let Xi , X s , Xt , and Z 4 be the scores on (1), (2), (3), and (4), respectively 
It is assumed that (Xi, Z 2 , X 3 , XJ has a normal 4-vanate distribution and 
that the set of scores (Xi a , X ia j -^3« j Xia) (# — 1, ■ • • , 126) obtained from 
the essays is a random sample of values of {Xi, X 2 , X 3 , X 4 ). The following 
three questions will be considered (see section 2), where the grouping of the four 
variates is (1, 3): (a) Is the sample consistent with the hypothesis Hi(mvc )? 
(b) Is the sample consistent with the hypothesis Hi(vc)? (c) Is the sample 
consistent with the hypothesis Hi(m )? In the particular experiment under 
discussion (a) is the experimenter’s question. 


3 Mr L R. Tucker (Educational Testing Service, Princeton, New Jersey) and Captain 
J. Allan Rafferty, M.D, (Air University School of Aviation Medicine, Randolph Field, 
Texas) kindly gave the author the data for Examples 1 and 2, respectively. 
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The sample means and vaiiance-covanance matrix are as follows: 


X 4 

x 2 

x 3 

x 4 

77.8976 

20.9425 

23.4544 

18.0384 

20.9425 

25.0704 

12 4363 

11.7257 

23.4544 

12.4363 

28.2021 

9 2281 

18 0384 

11.7257 

9.2281 

22.7390 


Means 28 0556 14 9048 15.4841 14 4444 

This matrix is (1/126) || Vu || (i, J = 1, ■ ■ ■ , 4) (see (4.3)). The sample criteria 

Li(mvc), Li(vc), and Li(m) will be used to answer questions (a), (b), and (c), 
respectively. The values of the criteria can be computed from the values of 
I »tj | , | Vij | , and | v i:l | (see (4.9), (7.1), (7 3)), where v[, is given m (4.7),and 
i>ij is given below (7.1). The v t] (i ^ 1 ^ j) are evaluated by simple averaging 
of certain elements in || v l} || . Both | v[j | and | v t] | have the block pattern 
of (3 2) and can be expressed in the simplified form of (3.3), where h = 1 and 
ni = 3; the simplified form of | v[, | can also be obtained from (4.10) and (4.11) 
From the data above it is found that 

L\(rnvc) = \v xi \/\v[,\ = .9214, 

Li(ac) = | v l} | / | v,j ] = .9568, 

Ia(m) = | K | / | v[, | = .9630. 

The second, fourth, and fifth formulas in (7.13) (for N = 126, b = 1, n - 3) 
give the distributions of Li(mvc), L t (vc), and Li(m), respectively (when the 
hypothesis with which the criterion is associated is true). By direct computa¬ 
tion with expressions for the Incomplete Beta Function ratios the per cent points 
corresponding to the observed values of Li(mvc), Li(vc), and Li(m) are found 
to be .26, .49, and 09, respectively. Thus at the 5% significance level the 
answer to any given one of the three questions (a), (b), (c) is yes. Critical 
values of Li(mv c), Li(vc), and Li(m) for various significance levels can be ob¬ 
tained from [3] by interpolation. 

Example 2. In an experiment to study certain properties of the blood of 
asphyxiated dogs, the %C0 2 and hematocrit of 10 asphyxiated dogs were meas¬ 
ured four minutes and seven minutes after asphyxiation Let Xi and be 
%C0 2 and hematocrit four minutes after asphyxiation, respectively, and X 2 
and Xi be %C0 2 and hematocrit seven minutes after asphyxiation, respectively 
It is assumed that (Xi, X 2 , X 3 , X*) has a normal 4-variate distribution and 
that the set of measurements (X io , X 2o , X 3 « , X 4a ) (a = 1, • , 10) obtained 
from the 10 dogs is a random sample of values of (Xi, X 2 , X 3 , X 4 ). The fol¬ 
lowing questions will be considered, where the grouping is (2 2 ): (a) Is the sample 
consistent with the hypothesis Fl\(nvc)l (b) Is the sample consistent with the 
hypothesis (c) Is the sample consistent with the hypothesis ffi(m)? 

In the particular experiment under discussion (a) is the experimenter’s question. 
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The sample means and sums of squares and cross-products are as follows 1 


Xi X2 A3 Xi 


294,916 

313.908 

-89 364 

-69 282 

313.908 

363 689 

-130.422 

-69 261 

-89,364 

-130.422 

210.356 

241.688 

-69.282 

-69.201 

241.688 

515.789 


Means 50.780 53.590 41.180 43.890. 

This matrix is || v l} || (i, j = 1, ■ • • , 4) (see (4.3)). The sample criteria Li(mvc), 
Li(vc), and Zi (m) will be used to answer questions (a), (b), and (c), respectively. 
The values of these criteria can be computed from the data above (see (7 6 ), 
(7 8 ), and (7 10)) and are found to be: 

Ei (mvc) = | | / | v t3 1 = 09107, 

Li{vc) = | v t , | / | v i} | = .3259, 

L l (vi) = \va\J \ v tl | = .2794. 

The sixth, seventh, and eighth formulas in (7.13) (for N = 10 , n — 2 ) give the 
distributions of Li(mrc), L l (uc), and Lt(m), respectively (when the hypothesis 
with which the criterion is associated is true). From [1] it is found that the 
observed values of Zi(mvc), Li(vc), and La(m) correspond to the 1 . 2 , 12 . 4 , and 
0 per cent points, respectively, of the distributions referred to above. Thus 
at the 5% significance level the answer to questions (a) and (c) is no and to (b) 
is yes. The critical values of Li(mvc), Li(vc), and Li(m) for various significance 
levels can be found from [3]. 

More than one of the sample criteria may be of interest in regard to a given 
sample (see [5] pp. 267-268). For example, in an experiment such as that 
described in Example 1 suppose the answer to question (a) is no. The experi¬ 
menter might then consider question (b); if the answer is no, the inconsistency 
of the sample with Hi(mvc) might be regarded as due to the variances or co- 
variances If the answer to (b) is yes, the experimenter might then consider (c); 
if the answer here is no, the inconsistency of the sample with H\(mvc) might be 
regarded as due to the means. If, however, the answer here is yes, further study 
might be required to “explain” the inconsistency. 
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BRANCHING PROCESSES 1 


By T, E. Harris 

Project RAND, Douglas Aircraft Company 

1. Summary. This paper is concerned with a simple mathematical model 
for a branching stochastic process Using the language of family trees we may 
illustrate the process as follows. The probability that a man has exactly r 
sons is p r , r = 0, 1 , 2, • ■ ■ . Each of his sons (who together make up the first 
generation) has the same probabilities of having a given number of sons of his 
own; the second generation have again the same probabilities, and so on, Let 
z„ be the number of individuals m the nth generation. We study the probability 
distribution of z„. Some previous results are given in section 2, these include 
procedures for computing moments of z n , and a criterion for when the family 
has probability 1 of dying out In sections 3 and 4 the case is considered where 
the family has a non-zero chance of surviving indefinitely In this case the 
random variables z„/Ez n converge in probability to a random variable w with 
cumulative distribution G(u), It is shown that G(u) is absolutely continuous 
for u 7* 0. Results of a Tauberian character are given for the behavior of G(u) 
as u —> 0 and u —» <*>. In section 5 some examples are given where G(u) can 
be found explicitly; G{u) is computed numerically for the case pi = 0.4, p 2 = 0 6. 
In section G families with probability 1 of extinction arc considered. A method 
is given for obtaining in certain cases an expansion for the moment-generating 
function of the number of generations before extinction occurs. In section 7 
maximum likelihood estimates are obtained for the p r and for the expecta¬ 
tion Ezi, consistency in a certain sense is proved. In section 8 a brief discussion 
is given of the relation between two types of mathematical models for branching 
processes. 

2. Introduction. By a branching stochastic process is meant a phenomenon 
of the following general type: each of an initial aggregate of objects can give rise 
to more objects of the same or different types, the objects produced can then 
produce more, and the system develops, subject to certain probability laws. 
Examples are the development of human or animal populations, propagation of 
genes, and nuclear chain reactions. The mathematical model dealt with in this 
paper may be thought of as representing the generation-by-generation growth 
of a family, the fundamental random variable being the number of individuals 
in the nth generation. Under certain conditions, however, this model may 
describe the size of a family at a sequence of points in time. This question will 
be touched on in section 8, 


1 Based on a doctoral dissertation presented to the Mathematics Department, Princeton 
XJniveisity, Juno, 1947. 
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Definition 2 1. The random variables z„ , n = 0, 1, 2, • • • , will be said to 
represent a simple discrete branching process provided: z 0 = 1; P(z L = r) = p r , 

oo 

r = 0, 1, 2, ■ • • , with 23 71- = 1; the conditional distribution of z, l+1 , given 

r =0 

z n = r, is that of the sum of r independent random variables, each having the 
same distribution as z x . 

oo 

Assumptions. Throughout this paper we assume that 23 rp r < «, that at 

1=0 

least two of the p r are positive, and that p 0 + Pi < 1. 

Definitions 2.2. Let x = Ez x = 2rp r , a = Var (z x ) = 2rp r - x 1 . Let 

oo 

/(s) = X] p r s r be the generating function of z\ (s denotes a complex variable), 

r -0 

oo 

Let p„ r = P(z n = r) and/ n (s) = 23 P* r s r ; of course p ir - p T and/o(s) = s. The 

r=Q 

assumptions given above insure that the first and second derivatives /'(s) and 
f'(s) are continuous in the set consisting of the interior of the unit circle and the 
point s = 1; thus derivative notations such as /"(l) are used even though f(s) 
may not be analytic at s = 1. It will be seen shortly that a similar remark 
applies to the functions / n (s) and certain functions to be introduced later. 

In the remainder of this section we shall summarize certain results, most of 
them are contained implicitly or explicitly in works by Fisher [1], Lotka 12], 
Steffensen [3], Ulam and Hawkins [4], Kolmogoroff [5], Kolmogoroff and Dmitriev 
[6], and Yaglom [7]; some of these references are not widely available. 

From our definition, P(z„+i = k\z n = j) is the coefficient of s L m [/(«)]'. 

oo 

Hence p n +\,k is the coefficient of s' m 23 Pi.j[/(s)] J , whence 

7=0 

(2.1) /«+i (s) = / n [/(s)] 

Letting n = 1, 2, • • • , successively, it follows that the generating function of z n 
is the nth functional iterate of f{s) Hence 

(2.2) /„+i(s) = /[/ n (s)] 

We note that fi(l) = Ez n , /'„'( 1) + fi(l) - Lf«(l)] 2 = Var(z„) Differentiation 
of (2.1) at s = 1 gives/„'+i(l) = x" +1 ; another differentiation gives/„+i(l) = 
/"(l)[f„'(l)]* + /'(l)/n(l) while twofold differentiation of (2.2) gives / "+i(l) = 
/"(l)f„'(l) + [/'(l)f/"(l); these two expressions for /"+i(l) can be equated and 
solved for /"(1), provided x = /'(1) 5^ 1. Thus the mean and variance of z n are 

2 nr n _ -| \ 

given by Ez„ — ( Ezf) n = x n ; Yar (z„) = _ ~ — > x ^ Var ( z ») = na > 

x = 1. Higher moments, if they exist, may be found by a similar process. 

Definition 2.3. Denote by a the smallest non-negative real root of the 
equation t = f(t). We see that x < 1 implies a = 1 while x > 1 implies 
0 < a < 1, the equality o = 0 holding if and only if po — 0. In no case can the 
half-open interval 0 < t < 1 contain more than one root. It is readily seen that 

(2.3) lim p„o = lim/„(0) = a 
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We thus have the well known result: the number a is the probability of eventual 
extinction of the family. The relation between a and x shows that the probability 
of extinction is 1 if and only if x < 1. 

It is also clear that 0 < t < 1 implies lim /„(/) = a; this, together with (2.3), 
shows that 

(2.4) lira p m = 0, r = 1, 2, ■ ■ . 

« -> CC 

Relation (2.4) means roughly that the family either dies out or gets very large 
In section 4 it will be shown that (2.4) holds uniformly in r. 

Definition 2.4. The random variables w n are defined by w n = z n /x". 

Clearly Ew n = 1 and Ew' n = 1 + "j~ ^1 — if x ^ 1. 

Suppose n > m. Then E{z n z m ) = 2 E(rz n | z m = r) = X) V™fx n ~ m = 

r r 

x n ~ n Ezm ■ Thus E(w n w m ) = Ewm , whence 

(2.5) E[w n — w m f = Ew'n — Ew„ , n > m. 

By virtue of (2.5) we obtain 

Theorem 2.1. If x > 1, the random variables w„ converge m mean square, 
hence m probability, to a random variable w. 

For in this case Evi\ —> 1 + 2 -as n —► » and (2.5) shows that 

x — x 

E(w n — w n f —> 0 as R and m «. Theorem 2.1 is then a consequence of [8], 
p. 38, I 

It is well known that convergence in mean square implies Ew' 1 ,, —> Ew’ and 
E(w n — l) 2 —> E(w — if whence Ew n —r Ew. 

Thus we have 

2 

(2.6) Ew = 1, Ew 2 = 1 + --— . 

X - X 

In order to study the behavior of z n for large n when x > 1, we consider the 
distribution of w. 

Definitions 2 5. G n (u) = P(w n < «);0*(s) = E(e Wn *) = [ e v dG n {u), 
Definitions 2,6. (Applicable when x > 1.) G{u) = P{w < u) ; <p(s) = 
E(e w “) = I e“ u dG(u). We shall refer to G(u) as the asymptotic distribution 

Jo- 

branching from f(s). 

The moment-generating functions (m.g f.’s) 4> n (s) and <j>(s) are defined at least 
for Re (s) < 0 Unless specifically stated otherwise we shall consider them only 
m that domain. » 

From (2.2) and the fact that 4> n (s) = f n [e’ lx ") it follows that <j> n +i(,sv) = /[</>« (s)]- 
Theorem 2.1 implies that if re > 1 G n (u) —> G(u) and 0„(s) —► <t>(s ) for Re (s) < 0. 
Thus themgf. <j>(s) satisfies the functional equation 

(2.7) t(sx) r= Ms)], Be (s) < 0. 
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Equation (2.7), which, of course is applicable only when x > 1, was obtained in a 
different form by TJlam and Hawkins It belongs to a type usually known as 
ICoenigs’ equation, after the nineteenth century mathematician who studied it 
in connection with functional iteration, and is related to an equation studied by 
Abel. 'We shall make some use of the work of Koenigs later. See Hadamard [9] 
and Koenigs [10]. 

We note that Ew l < °o if and only if Ez\ < «. It was already pointed out 
that Ew = 1. As pointed out in [4], as many further moments of w as exist 
may be found by successive differentiation of (2,7) at s = 0. 

Finally we note that (7 n (0) = p n0 . Hence lim (?„(0) = a Thus G( 0) = 
P(w = 0 )> a. We show later that (7(0) = a Clearly G(u) = 0 for u < 0. 

In sections 3 and 4 we always assume x > 1. 


3. Asymptotic properties of the moment-generating function. We first 
show that (2.7) uniquely determines the distribution of w. Specifically, 
Theorem 3.1. Let Gfiu) and G 2 (u) be distributions with equal first moments 
and finite second moments whose characteristic functions fa (it) and fa (it) satisfy 
(t is real ) fa(itx) = f[fa(it)], r = 1, 2. Then Gi(u ) = G 2 (u). 

From [13], p. 27, fa(it) — fa(it) = t 2 p(t), where fi(t) is bounded as t —> 0. 


From (2.7), \fa(itx) - fa(itx) | = \f[fa(it)) - f[fa(it)] \ < x \ fa(it) 


since |/'(s) | < x when | s | < 1. Hence for t ^ 0, 


K:) 


> x | /3(0 


fa(it) |, 
Thus 


/3(0 cannot be bounded near t = 0 unless it is identically zero, hence 


fa (it) = fa (it). 

It is clear that the requirement that 4> (s) have the form 1 + s + 0(s 2 ) between 
two rays from the origin is sufficient for the uniqueness m that domain of solu¬ 
tions of (2.7). On the other hand, continuous solutions can be constructed at 
will if the existence of a derivative near s = 0 is not required. 

Before proceeding further, it is convenient to define three functions k(s), 
fas), and H(u) which are closely related to/(«), <t>(s), and G(u) respectively. We 
repeat that we are considering only the case x > 1. See definition 2 3 for a. 

Definitions 3.1. Letfc(s) = — --. Clearly k(s) is a proba¬ 

bility generating function with /c(0) = 0, k'( 1) = /'(1) = x, k"( 1) < oo. We 

oQ 

write k(s) = 2 gvsr. We also define the iterates k n (s ) by 

r=l 

ko(s) = s , hn+fis) = A.[/c„(s)]. 

Definitions 3.2. Let II(u) be the asymptotic distribution branching from k(s) 
(See Definition 2.6.) Let \p(s) be the corresponding moment-generating func¬ 
tion We know then that \j/(s) and k(s) satisfy 


i(sz) = *#(»)] 


(3.1) 
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In view of the uniqueness theorem we have, by direct substitution in (3.1) that 
'l'(s) must be given by ’ 

(3.2) i( s ) = 40 - a)s] - a 

1 — a ’ 


and that H(u) must be given by 


(3.3) II(u) = 


(v 


1 — a 


u>0; H(u) = (), u < 0. 


We shall see later that 11(0) = 0 ; i.e, that (7(0) = a. Therefore H(u) is the 
conditional distribution of (1 - a)w : given that xu 0 Another way of statine 
this is as follows: s 

Theorem 3 2. The random variable w is disk ibuted as the product of two inde¬ 
pendent random variables w a -w', where w 0 takes the values 0 and with vro i_ 

abilities a and l- a respectively while w' has the asymptotic distribution branchmn 
from k(s), J 

For it is directly verifiable that i(s) is the m.g.f of w 0 ■ w'. 

In theorems 3.3 and 3.4 we consider the behavior of i(s) for large | s I. To 
make for smoother reading we defer the proofs till section 9 , where somewhat 
more general formulations arc given. In section 4 the properties of t Ms) are 
interpreted in terms of G(u). 

Definition 3 3, Let 7 = log^-J = log* . (See definitions 2 3 and 

3 1,) If = 0 (i.e., po = pi = 0) we take 7 = co. 

Theorem 3.3. Suppose 7 < «• Then if Re (s) < 0 and s 0, 

(3 4) vK«) = yy + M 0 (s). 

M(s) is continuous for s 5 ^ 0; M(s) and Mq(s) satisfy respectively 

(3.5) M(sx) = M(s), M 0 (s) = O , | s | „ 

Remarks. (See section 9 for proof.) (a). Under the conditions of the theorem 
M(s) is real and positive when s is real and negative. (6) If Ez[ < «. and the 
conditions of the theorem hold, the rth derivative of i(s) satisfies 

(«) l^’WI-o^), M-,-. 

(c) If 7 = 4>(s) and as many derivatives as exist approach 0 exponentially 

as | $ | —» to . 

We now consider the behavior of ^(s) on the positive real axis, provided it is 
defined there. 
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Lemma 3 l Let f(s) be analytic in the circle | s | <<*,«> 1 Then <j>(s) and 
\p(s) are analytic m some neighborhood of s = 0. 

We use a theorem of PoincaM [11] which insures that there is exactly one 
function <t>(s) analytic near s = 0 with 4>(0) = = 1 and satisfying 

4>(sx) = f[4>(s)]. 

(Although Poincare’s proof is for the case /(s) rational, it applies equally well 
here ) The circle of convergence of the MacLaurin senes for 4>(s) has radius L 
where 4>(t a ) = a. An argument whose details are given in [12], p. 21, then 
shows that <H s ) = 4>(s) for ( s j < t a , and Lemma 3.1 follows. (The argument 
is necessary to rule out the possibility that the <j> n (s) converge to 4>(s) for 
Ee (s) < 0 but to some other function for Re (s) > 0 ) Clearly <£(s) and i(s) 
are entire if and only if /(s) is entire. 

Lemma 3.1 is useful for actual computation of G(u). The (non-negative) 
coefficients c, in the series 4>(s) = 1 -f- s -f- c»s 2 -)-•■- can be determined by 
differentiating (2.7) at s = 0. The series can be used to compute values of the 
characteristic function 4>(it) on some interval t 0 < t < kx, where k is a small real 
number; the values of $(it) for the remaining values of t are determined by (2.7). 
(Note that the real and imaginary parts of 4>(it) are respectively even and odd ) 
Then the usual inversion formula is used to obtain G{u). A numerical example 
of this procedure is worked out in section 5 

Definition 3 4. The number p is defined by p = log JC dif/(s) is a polynomial 
of degree d, p = ■» otherwise. 

Theorem 3 4. Let f(s) ( and hence 7c(s)) be a polynomial of degreed Then 
for s > 0 

^ - m + «•>, 


L{s) is continuous and positive, L(s) and L 0 (s) satisfy respectively 


L{sx) = L(s), 


‘ 0 (?) 


5 ^ co, 


The proof is in section 9 (Theorem 3.4 may be compared with a more widely 
applicable but less precise result due to Shah [19].) 

Corollary. If f(s) is a polynomial of degree d, ^(s) is an entire function of 
order p and type C where C = Max L(s), 1 < s < x 
An explicit determination for C has not been found An approximate numeri¬ 


cal determination is not difficult, the function L(s) = lim 


logjUvKs)] 

s f d " 


can be 


determined numerically for a number of values on some convenient interval 
So < s < s a x, and the maximum value approximated The importance of C 
will be indicated in the conjecture following Theorem 4 3, We may also men¬ 
tion that the quantity [Max L(s) - Min L(s)], 1 < s < x, is of some interest 
Some numerical work indicates that in certain cases L(s) is at least approxi¬ 
mately constant 
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4. Some properties of G(u). Since it will bo convenient to work with H(u) 
rather than G(u), wc .state the content of Theorems 4.1, 4.2, and 4 3 in terms 

of G(u): G(u) = a + g(i>) do for u> 0 The density g(u) is continuous for 

u ^ 0. If Ezi < oo then (/'\u) is continuous foi provided r < 7 + fc — 1 
and is continuous foi u = 0 provided r < 7 - 1 . Near u = 0, G(u), provided 
7 < “, approximates, in a certain mean sense made clear by Theorem 4.2 the 

/j _ x 7 11 i 

function a + M T Jf[u(l — a)], where for convenience wc have defined 

M(u) for positive u by M(u) = M(~u) It is then shown that in a certain sense 
g(u) goes to zero faster than exp (—u Q ~ l ) and slower than exp (-it 0+ ') where e is 
any positive number, Q being defined m Theorem 4.3. A conjecture ' .ven of a 
more precise result, applicable when/(s) is a polynomial: m the same sense g(u) 
goes to zero (more, less) rapidly than (exp [—(A* — t)u a ], exp [- (A + -’ r e)u Q ]), 
where A* is defined in the conjecture. 

Definition 4.1. Let II'(u) = h(u). 

Theorem 4.1 II(u) is absolutely continuous Theorem 3 3 shows that H(u) 
is continuous, see [13], p. 25. This incidentally shows that (3(0) = a. If 
7 > 2 the absolute continuity of II(u ) follows from the Plancherel theorem 
See any text on Fourier transforms. In any case, define the functions 


I 

.(«) = J e ’'V(r0 dl, m = 1, 2, 


An integration by parts 2 gives for 


(4.1) hju) = ~- [ 1 p(im)e 
2riu 


- i(-irn)c imu ] + f ^ dL 

2i r K r i\l J—ni Clt 


If 0 < % < u < u 2 , (4.1), (3.4), and (3.6) show that the continuous functions 
h m (u) converge uniformly in [in , wj to a continuous function h(u). Moreover 


(4.2) 


H{wi) — H(ui) = hm [ 

m-too 


— 2irit 


m—*oo 

v 2 


\p(it) dt 


2 r n 2 

= bm / h m (u) du = / h(u) du, 

VI —»00 V u 1 j «l 


the first equality in (4 2) following fiom [13], p 28 and the second from the fact 
that the h m (u) are uniformly bounded for ui < u < m? In case Ez\ < 00 
and r < 7 + h — 1, repeated integration by parts of (4 1) and reference to 
remaik (b), Theorem (3,3), shows that the first r derivatives of h(u) are con¬ 
tinuous if u 0 The usual integral expression for h(u ) m terms of \f/(it) shows 
that 7 > r + 1 implies /i (,) (u) is continuous at 0 . 


2 I am indebted to J. W Tukcy for this suggestion, which simplifies the original proof 
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Corollary to the continuity op H(u) the numbers p nr = P(z n = r) —> 0 
uniformly in r, r > 1, as n —» ». We have 


G. 


— G{ — 

,x n 


+ 


G 


fe) 


-G[2L-I 


+ 


G 


r - J 

.i" 


— G« 


r - 1 
ar' 1 


The desired result follows because G n (u) -»G(u) uniformly for u > 0 and because 
G(u) must be uniformly continuous for 0 fC u < » (right-continuity at 0) 
We next consider the behavior of H(u) near u = 0, when y < oo. Theorem 
3 3 suggests what sort of result may be expected. If the function M(s) of 
TheoreBro f^f^werc a constant M it would follow iiom a Taubenan theorem due 


to Karamata (see [14], pp 189-192) that H(u) 
H(u) 5 " l¥ 


Mw 


as u —> 0+, or 


-id +1 uT{y + 1) 
give 


T(y + 1) 

Integrating both sides of this l elation from u to ux would 


(4 3) 


f 

O 11 


' H(y) dv 

yr+i 


M dv 


r(i + l) 


rM, 

Jl V 


The analogue of (4.3) turns out to be true, as shown by Theorem 4.2, which 
shows that in a certain mean sense, H(u) behaves like ag u q , 

r(y + 1) 

(We defined M(u) = M (— u) for u > 0 ) 

Theorem 4 2 


Lim 

u—►0+ 



H(v) dv 
»t+ l 


. 1 f M(v) du 

r (7 + l) Jl V 


The proof, which follows directly along the lines of the proof of Karamata’s 
theorem, is sketched briefly m section 9, for a somewhat more general situation 
A corollary of Theorem 4.2 is that if y < 1, h(u) cannot be bounded as u — 1 0+ , 
for h(u) < K implies 


lim 


r x k ■ vdv 

L y T+1 


> 


(7 + 1 ) 


/; 


^ * > 0 , 
V 


or 


lim u~ y > 0, 


which implies y > 1. An example to be given m section 5 shows that if y = 1, 
h{u) is at least in certain cases bounded but discontinuous at 0 

In order to consider the behavior of H(u) as w —* we first prove a theorem 

which applies to any distribution whose m g.f, is an entire function. 

Theorem 4.3 3 Let F(u ) be any ed.f whose m,g / £(s) is entire. Let pbethe 
order of £(s) Let Q be defined by 

Q = l.u.b. q: f e M dF{u ) < <*> 


3 Before completing the present proof, the writer communicated this result to R F.Boas, 
Jr , who sent back a proof along different lines 
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The proof is given in section 9. 

Combining Theorems 3.4 and 4.3, wc obtain immediately 

Theorem 4,4. Let Q = lu.b. q: / e un h{u) du < «s. Then Q = — p — . 

Jo p — 1 

Here p is given by definition 3.4. If j'(s) is not a polynomial, whether entire 
or not, the proof of theorem 4.3 will show that Q = 1, and we interpret theorem 
4.4 in that sense. The trivial case/(s) = s' is excluded, so p > 1. 

Conjecture. Let £(s) oj theorem 4.3 he of finite order p and of type C, 

» oo 

0 < C < °o. Let Q = — —— an d ^ A = Lu.b. A'-. / e'*' lul ° dF(u) < co. 
Then (Cp) Q (AQY = 1. 

The proof for the case p rational follows the same lines as the proof of Theorem 
4.3; a general proof has not been found. If the conjecture is true then having 
determined p and Q, when fc(s) is a polynomial, and having estimated C by the 
procedure indicated following the corollary to theorem 3.4, we obtain 

(,4) 


for the l.u.b of the numbers A 1 such that / e A ' uQ h{u ) du < 

Jo 


responding number A* which applies to g(u) is given by 
(4.5) A* = A(1 - a) Q . 


00 


The cor- 


5. Some special cases. In this section we shall discuss some special cases in 
which the m.g.f. 4>(s) and the c.df. G(u) may be determined explicitly, For 
these cases and for certain others there is a close relationship between the simple 
discrete branching process and another type of model to be discussed in section 8. 
Finally a numerical computation of the distribution G{u ) will be given for a 
particular case where f(s) is a second degree polynomial 
Suppose/(s) has the form 

_L_ 

1 4- a — as 

with x > 1, a > x — 1, where /'( 1) = x and /"( 1) + /'(l) = Bz\ = m(l + 2a). 
It is easily verified (as pointed out by Poincar6 in [11]), that the solution of the 

— lls 

equation <t>{sx) = /[<£(s)] is given by 4>(s) = 1 + “ —u-with^(O) = <j>'(Q) = 1 

X ' X CKiS 

The number a satisfying a = J(a) is given by a = —— —- , The functions 

a 

1 5 

f(s) and k(s) of section 4 are given by \f(s) = - , fc(s) = ^ ^ ^ • 


/(s) 


=1 

a a \ 
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The number y of Theorem 3 3 is 1. The density function h(u ) (definition 4 1) 
is simply e~“, as seen by direct calculation. The number Q of Theorem 4 3 is 1, 
as it should be, since/(s) is not an entire function. The c.d.f H(u) is 1 — e~", 
and H(u) ~ u near u = 0, in agieement with Theorem 4,2 Various aspects 
As + B 

of the case /(s) = have been discussed by numeious authors 

Somewhat more generally, we may consider generating functions of the form 


( 51 .) 


k(s) = s[x — (.r — l)s"“] 


Mil—1/m 


X > 1 


The function k(s) is a generating function if and only if m is a non-negative 
integer In this case we have </>(s) = \p(s) = (1 — and g{v) = h(u) - 


ll/m)—l —(u/tn> 
U C/ • 


Here y = - , and we note that unless m = 1 the 
m 


density function h(u) is unbounded near u = 0. A physical interpretation for 
this case will be given in section 8 

As a numerical illustration we consider the case /(s) = 0.4s + 0 6s 2 We 
have x = Ez x = 16 and <r 2 = E(z x - xf = 0.24. For the asymptotic distribu¬ 
tion, Ew = 1, E(w - l) 2 = ^ZT X = 0 25. The number y = log 16 = 

1.9495 so that ^(s) which is identical with <j>(s) in this case, is O^j^-jiWe^ as 

| s | goes to °° with Re (s) < 0 This implies that the c.d.f H(u) and likewise 
G(u), since the two are equal here, behaves like [l/r(l + 7 )W(u) times u 1,m " 
near u = 0, where the "behavior” is m the sense of Theorem 4 2, Numerical 
determination of M(u) would not be difficult. The number p of Theorem 4 4 
is given by log; 2 = 1 4748. This means that ^(s) is an entire function of order 
1.4748 and hence that the density function h(u ) goes to zero more rapidly than 


-U0-* 


e and less rapidly than e “ * for any e > 0, where Q — _ ^ - 3.1061, 

and “more rapidly” is used in the sense of Theorem 4.4. 

The function L(s) = lim lo& ■ was computed for four values of s between 
71-100 s p 2 n 

s = 1 and s ~ x = 1.6, in each case the value w r as 0.744625 so that it appears 

likely that here L(s) is constant Hence C = Max L(s) = 0.744625 and the 

quantity A defined by (4.4) is 0.26430. Thus the conjecture following theorem 

4.4 indicates that f g(u)e {0 741626±() “ 31001 du is (divergent, convergent) accord- 
Jo 

ing as the + or — sign holds. 

Through the kindness of Mr. Cecil Hastings of the Douglas Aircraft Company, 
the c di. G(u) was computed for this case The coefficients in the power series 
expansion of 4>(s) were obtained from the functional equation (3 1) and G(u) 
was then obtained by inverting The values of G(u) are given in Table I, 
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6. Number of generations to extinction. It, was pointed out in section 2 that 
when x < 1 the probability is 1 that z„ = 0 foi some integer n Wc assume 
through-end section 0 that x < 1. 


TABLE I 

G(u), the limiting probability that z,Jx n < u for the case f(s) = 0.4s + 0.6s s 


V 

(Hu) 

0.00 

.00000 

0 25 

.04753 

0.50 

.17275 

0.75 

.34550 

1.00 

.53117 

1.25 

.69932 

1 50 

83042 

1.75 

.91857 

2.00 

96781 

2.50 

99751 

3.00 

.99993 


Definitions 6.1 Let the random variable N be the smallest integer n such 
that z„+i = 0. Define the moment-generating function of N by 

0(s) = Z c m P(N = n) 


Clearly P(N = n) = p„ +] ,o - p n0 , so that 0(s) = e n “(p n+1 ,, - p„ 0 ). 

Definitions 6.2. Let b „ = 1 p„+i,o, with bo = 1 — The numbers b n 
satisfy the recursive relation 

W- 1 ) &»+l = 1 - /(I - bn). 

Define the function 0 x (s) by 

(a) = £ b n e"\ 

71 tei 0 

We see that 

( 6 '2) 6(s) = 1 + (e* — 1)^(3), 

so that it suffices to determine the function @i(s). 

Ihe function Oi(s') belongs to a type which has been studied by Fatou [15] 
and Lattes [16] If we let e‘ = z we see that Ofz) is a power series whose coeffi¬ 
cients are successive iterates of the function /*(&) = 1 - /(I - 5); i e , b„+i - 
f*0 } n ) = fn + i ( bo ), where /*(0) = 0, /*'(0) = a: < 1. It was shown by Fatou 
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—n log x,n — 


was obtained by Lattfis, the expansion converging everywhere except at the 
poles The quantities g, and y 0 are defined as follows the function p(s) = 
yis + n* s 2 + /us 3 + is determined by the functional equation g(sa) = /*[/*(s)] 
with the condition ?i'(l) = in = 1 The number y 0 is determined by p(yo) = 
l) 0 == 1 — p a Perhaps the easiest way to determine y 0 is to use the fact that 
the inverse function « -1 (s) satisfies the functional equation ju -1 [P(s)] = xp (s), 
fiom which we can determine the power series foi pT (bo) 

Since the use of Lattes’ expansion leqinres finding the expansions of p(s) and 
p~ l (s), we now give another method, giving a different kind of expansion, this 
method appears particularly adapted to the case here illustrated, where /(s) 
is of the second degree Then (6 1) becomes 

(6.3) fi„+i = xb n - Pi fin , bo = 1 - po 

Definition 6 3 The functions fc = 1, 2, • • • , are given by 

(G 4) fli(s) = 11 (b n ) L e n ‘. 

Tl“0 

If we raise both sides of (6.3) to the fcth power, multiply both sides by e", sum 
on n from 0 to «, and solve for 6fi(s), we obtain 


(0,5) 0*(s) = 


tie" + t (*) 

g-B _ -jl 


that a function of this soit is meromorphic with poles at s = 
12, • ■ An expansion for 0i(s) m the form 


*w = + r ^ I - 

1 — 17 1 “ XX s 


+ r - V + 

1 — r 3 c" 


(Justification for the learrangement of series will come out of the subsequent 
proof) If we put h = 1 in (6.5) we obtain 


( 6 . 6 ) 


/ \ b 0 e ’ - P 20 i(s) 

ftw = —'Tm; 


Definitions 6,4. We define recuisively sequences of functions S„(s) and RJs), 
such that for each n, 0i(s) = 5 n (s) + S„(s), Let 



Ri(s) = 


Pi<h{s) 
e~ 3 — x 


Suppose now that R n (s) is of the form A„i0„ + i(s) + + A n n^(s), the A nj 

being functions of s, p 2 , and x, but not explicitly of h , while S„(s) is a rationa 
function of e-, p 2 , and x, and a polynomial of degree n m b 0 . Low put 
Z; = n + 1 in (6 5) and substitute the expression obtained for 0„+i(s) mto K n (s) 
Collecting terms we now define if*+i(s) as the sum of terms involving ^ 

0 Sn+2 (s): R„ +J (s) = A fl+lll 0„ +2 (s) + ■ + A„ + ,.«+i<W(s); then S„+i(s) 
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9i(s) — R n + i(s) is a rational function of e~‘, p 2 , and x, and a polynomial of 
degree n + 1 in bo ■ 

Theorem 6 .1. Let f(s ) = p 0 + pis + P 2 S 2 , with x < 1. Suppose that 
x + Pibo < 1. Then the junctions S n (s) converge to O^s ) m a neighborhood o/s = 0 
The restriction x + p»fto < 1 may fail to hold. However this is not a serious 
lestriction; wc pick a value of n so that x + p 2 b n < 1. Then 

9i(s) = 2>o + ■ ■ • + fen - 1)8 + e n ‘6i(s), 

00 

where 6* (s) = X b,e u ~ n)> is the same type of function as 0 i(s); theoiem 6 1 
* 

is then applicable to 0 i (s). 

If the conditions of theorem 6.1 are satisfied, we have 

. fli(s) = 6 oe~ 8 [ 7 Ti(s, x) — PibnTr 2 (s, x) + 2xplbhr 3 (s, x) 

(6.7) , , 

— Pibo(e 3 + 5x 3 )t4(s, v) + • • i 

where 7 t/ £ (s, x) = II (~=7 r '). Since E(N) = 6'( 0 ) = 0 i(O) and E(N') = 

0"(O) = 2o((0) + 0i (0), we have 

E(N) = fe 0 [,T,(O, x) - pibo Ti(0, x) + 2xplblir 3 (0, x ) 

— plbl (1 + 5x 2 )tri(Q, x ) + • • •], 

U(iV 2 ) = -E(N) + 2fe 0 [ir((0, *) - pzboTiiO, x) 

+ 2rpibJ»i(0, a:) — (5a : 8 + l)p*&(Ui(0, x) 

+ plblvtiQ, ,i) + • ■ ■] 

* 1 

where 7 r(( 0 , x) = n( 0 , x) X -- - . 

r„l 1 — x r 

We now prove that if x + p 2 feo < 1, the expansion (6.7) is valid in some neigh¬ 
borhood of s = 0 We shall denote the particular values of x, p 2 , and fe 0 with 
which we are dealing by x, p 2 , and fe 0 . Now let x, p 3 , and b Q be three complex 
numbers, arbitrary except for the following restrictions: 

( 6 . 8 ) I * I + I Pi I < 1> l & 0 1 < 1 

and define the numbers fe„ in terms of fe«, x, and p 2 , by means of (6.3), with 
Obis') defined by (6.4). 

We first show that (6.7) is valid if ((j 8 ) holds, and then show that the domain 
of validity also includes the original numbers x, p 2 , and b 0 , provided 

& + P 260 < 1 . 

If ( 6 . 8 ) is satisfied, we have | fe„ | < A | x |" where A is a positive constant. 

Now suppose 1 < T < - . Then the series defining 0*(s), h = 1, 2, • - • , are 

cc 
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uniformly and absolutely convergent in the domain | e" | < T. Moreover, if 
| x | + 1 ps I = A < 1, we have | b n | < b 0 A n whence, if k is an integer large 
enough so that TA k < |, 

(6.9) |fl fc (s)|<2b ! i 

for | e s | < T, In what follows, we assume | e a | < T. Now write fb(s) = 

n 

S n {s ) + ^A nJ (p 2 , X, s)d n+] (s), where n is large enough so that TA n < \ 

i-i 

Let A n (p s , x, s) = Max | A n) (pi, x, s) \ . Passing to the next stage we see 

l£l£n 

IA 11 / A n+1 \ 

that A n+1 < A rl + -d—^ A B+l < An 1 + ^ —3 Hence the numbers 

e — x \ e — % J 

A n are bounded. This fact, together with (6 9), shows that lim R„(s) = 0. 

71— 

Now suppose that x and bo have their original values x and bo while ps is small 
enough in absolute value so that x + | p 2 | < 1. In this case lim S«(s) = 0i(s) 

We observe that S n (s) is a polynomial of degree n — 1 in and that <S n+ i(s) is 
obtained from S„(s) by adding a single term of degree n in p 2 . Thus 0i(s) has 
been expressed as a power senes in p %. Now consider 8i (s) as a function of jh , 
with bo = bo, x — x. If x "t - bo | Ps | ^ 1> we bave b n = 0[(£) ] Thus Ofs) 

1 — ^ 

is analytic in for | p 2 \ < ~r ~ and the expansion in (6 7), being a power 

Oo 

series in p 2 , must be valid when x + pah < 1. 


7. Estimation of parameters. Until now we have assumed that the param¬ 
eters p T are known numbers. We may wish, however, to estimate them, having 
observed the numbers z\,zi, • , z»+i ■ In order to get simple maximum like¬ 

lihood estimates for the p r , it appears necessary to introduce certain auxiliary 
random variables 

Definitions 7 1. Let be the number of individuals in the mth generation 
who have exactly k descendents m the (m + l)st generation. Let Z n = 

Theorem 7.1. Maximum likelihood estimates of p r and x, based on observed values 
of Zmk for m < ft, are respectively, 


V< 


= EWZ.> x=(z n+1 -i )/z n . 


7T»=nO 


(Note that the estimate i involves only zi, • • ■ , 2 »+i •) 

If z m is fixed the joint conditional probability function of Zmo, zm , •' ‘ > 13 

j ft (Zmr)Thus the joint probability function of the s mr for 

„ t ,n, and r * 0, 1, 2, • • • , is given by the product of two factors, 

one of which is independent of the p,, the logarithm of the other being L (£ 


W.ftp""' 

7=0 



488 


T. K HAItKIS 


log p r The value of this expression is clearly maximized by taking p,. = fi r 
as given above. Since 2z mr = z,» and X/z™r = “mi l , the cjuantity 2jrp r gives 

r r 

x as above 

Although the estimates p r are the same as we would obtain if we were dealing 
with Z n trials from a multinomial distribution with probabilities p t , the joint 

n 

distribution of the quantities X) 2mr, r = 0, 1, , is not multinomial. For 

m—0 

example, 1 ' Z n > 1 the probability of the event 

(X) z«o = , X) 2 ">r = 0 for r^o} is 0. 

lra—0 »i>.0 J 


We shall next show that the estimate x is, m a certain sense, consistent. 
Theorem 7.2. If x > 1, the random variables Z n +i/Z n converge m probability 

to the random variable xV* where V* = if w — 0 ami V* = 1 if w ^ 0. 

If w ^ 0 then for all ti, 2 , ^ 0 and 1 /%, —> 0 as n —> 00 . Hence in this case 
(Z n+1 — lj/Z n converges to x if Z n+ i/Z„ does On the other hand, P(w = 0) 
= a = P(z n = 0) for somen, so that if w = 0, Z n+ i/Z„ = 1 with probability 1 for 
n large enough. Thus we need only show that Z n+1 /Z n converges to v if x > 1 
and w ^ 0. 

We need the following: 

Lemma 71. If x > 1, the random variables Z n /x n converge m probability to 
wx 

x — 1' 

Since 


(7.1) 


WX 6 n _ W 

x — 1 x” a:" +l 


(^i) + § 


(w — W T ) 


/ X \ 2 \ 

it will be sufficient to show that lim ( z ) 2 n +2 E(w 2 ) = 0 andhm 

n —>00 \X i- J X 

E\ 2 ——} = 0. The truth of the first statement is obvious, since Ew 2 

\r~G r j 

is finite. It follows fiom (2.5) that E(w,w.) = Ewl if s > r, E(ww r ) = lim 

n —r M 

0 . 2 

E(w n w f ) = Ewl , whence E(w — w r ) 2 = ^ and E\(w — w r )(iv — w,)] = 


(ar — x)x 


if s > r Then 


E (±( jfLz ^\^]'+22 2 4 

x n ~’ j a; 2 '* a 2 - x |_£o J 


and this quantity clearly approaches 0 as n —> w, proving Lemma 7.1. 
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Define the random variables w* and 

V n as 


w* = w 

when 

w 0 

IV* = 1 

when 

w = 0 

3 

11 

when 

Zn ¥= 0 

v _ x 
* n 1 

when 

2n = 0, 


It is clear that the V n converge in probability to w* - 7 , and we note that the 

x — 1 

c.d.f. of w* is continuous at w* = 0 . Hence, 

H n+ i 


lim P 

7»—»oo 


( - 1 > e > o') = lim p(v n+1 - y„± y„ £ £ o) 

\ V n / n—»oo 


= p(±™ 




o)-o. 


It follows, under the conditional hypothesis w ¥■ 0, that the variates con- 

/Jn 

verge m probability to x, since 

^y' = x when z n +i ^ 0 . 

fj-n V n 


8 . Continuous models. As mentioned in section 1 there are situations where 
it is more important to consider the number of individuals existing at a given 
time than the number m a given generation. Let a set of probabilities p T be 
given. The question arises whether we can interpret these as probabilities that 
an individual will have a given number of descendants at the end of some fixed 
period of time We might then suppose that each individual in existence at 
that time has the same probabilities of having a given number of descendents at 
the end of the next (equal) length of time, these probabilities being independent 
of the age of the individual, A model of this sort might be considered in certain 
fission processes, if the probability of fission is independent of age It should 
be noted that the “descendents” of an individual may include the individual. 
Tor example, if a bacterium splits in two we may either regard it as having pro¬ 
duced two descendents and dying, or as having produced one descendent and 
itself surviving 

If an interpretation of this sort is to be satisfactory, interpolation in time must 
be possible In other words there should exist a family of functions /„(s) defined 
for all positive n such that /n,[/n 2 ( s )] = Ui+^(. s )> such that for each positive n, 

00 

fjs) is a probability generating function, /„(s) = J^p r (n)s r ; and such that for 
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n = 0, 1, 2, ■ the functions /„(s) coincide with the iterates s, /(s), /(/(«)], ■ . _ 

We may then interpret f„(s ) as the generating function at time n. It is readily 
seen that in general such a family of functions will not exist. For example, if 
such a family exists we must have /(s) = nth iterate of fi/ n (s) for arbitrarily large 
integral n, so that/(s) cannot be a polynomial of degree > 2 
The functional equation <l>(sx) = /(<£(s)] shows that j(s) = 0[*$ -1 (s)], whence 
f„(s) = 4>[x n 4>~ l (s)] for integral n. The expression <t>[x n 4>~ l (s)] then might be 
taken as the definition of /„(s) for all positive n. See Hadamard, [9], The prob¬ 
lem of determining whether the functions so defined are a family of generating 
functions will be discussed in a subsequent paper. ■ We remark, however, that 


if /(s) has the form •—~~ r~ -rr considered in section 5 then the iterates /„(s) 

5 

have the form ; they are clearly generating functions for all posi¬ 

tive n, satisfying the required relation /„,(/„„) = f ni + m • Now suppose g(s) 
is some function such that the function /(s) = is a generat¬ 

ing function for all x > 1, with {/(l) = 1. As pointed out by Ulam and Hawkins, 
the iterates of functions /(s) of this form are convenient to work with, the nth 
,r g(s) I 

iterate being simply g | "(® J ' a< ^ 1 ^ on > Te 9 u ir ement that 

/(s) be a generating function for all x > 1 shows that the functions/„(s) are 
generating functions for all n > 0. The simplest function g(s) which satisfies our 
requirements is g(s ) = s m , where m is any positive integer. In this ease /(s) 
has the form considered in (5.1) and f„(s) = ate" — (x" — l)s m p 1/m . As n —► 0 

have /„(s) = (1 — log x |s + ~ s m+1 + 0(n 2 ). We may interpret this 

\ m / m 

as follows. A particle in existence at a given time may, in a short time interval 

At, either split into m + 1 particles, with probability ; or it may remain 

7ft 


we 


unaltered, with probability 1 — If it splits, each particle produced 

m 

has the same chances for splitting as its parent, etc. Thus, from the results of 
section 5, it follows that if we begin with a single particle at time f = 0, the 
asymptotic probability density function for Zi/x‘, whore z t is the number of 

particles at time t, is given by (m -1 1 m u lm ~ i e~" lr ”)/ T 

It is, of course, customary to begin with the elementary probabilities for a 
certain number of births m a short time At and determine the functions f„(s) 
from these by means of differential equations. See, for example, Arley, [17]. 
The results of the present paper can be applied in some cases to the continuous 
problem even when an explicit determination of the/ n (s) is difficult. A discus¬ 
sion will be given in a later paper. 
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9. Some proofs. We give m this section proofs for (A) theorem 3.3, (B) 
theorem 3 4, (C) theorem 4.2, and (D) theorem 4 3; in certain cases we shall 
indicate slightly more general results. 

(A) We make use of a result of Koenigs, m the form applicable here 

Koenigs’ theorem: If \ s \ < X < 1 and qi ^ 0, then k n (s) = qiB(s) 

[1 + 0(<?i)] where B(s) is analytic for \ s j < X and satisfies the functional equation 
B[/c(s)] = gi-B(s). 

Here, O(qi) means bounded by Ag" , where A is independent of s. Weremaik 
that B(s) ^ 0. The proof of Koenigs’ theorem follows readily if we write fc„(s) = 


?i 


-Wri 

i=i 


1 + 


BM 


Qi 


where f (s) = — 
£ 


Si' 


Now let ti be a positive number such that [ f(s) j <1 when 0 < | s | < k and 
Re(s) < 0. (For the rest of this proof we assume Re(s) < 0) Such a number 
exists; on the imaginary axis we have f(it) = 1 + it ~ §E[(w'f]t 2 + o(i 2 ) where 
KKic') 2 ] > 1) w> having the distribution branching from fc(s), showing that 
| flit) | < 1 if t 0 and sufficiently small, while if Re(s) < 0 we refer to the 

expression f(s) = [ e u dH(u). Let X = Max | f(s) | for trfx < | s | < k . 

If | s | > h let N(s ) be the smallest integer such that | s |/x w0 ° < ti . Then 
f(s) = k»Ms/x N(,) )} = «J rW ^(«/* w(,) )]ll + 0(gf (,) )] =mm + 0(01- 
Now B{j/(sx)] = q x B[f(s)]. Let M(s) = \ s | Y j3[^(s)]. Then M(sx) = M(s). 


s/ti |, and theorem 3 3 follows. Clearly 
< t \, and hence, by functional continua- 


Also log* | s/h | < N(s) < 1 + log* 

M{$)/\ s | T is continuous for kx < | s 
tion, wherever Re(s) < 0, s ^ 0. 

Concerning the remarks following Theorem 3.3 we have the following: 

(a) If EsI < r-fold differentiation of f{sx n ) = k n [f(s)] gives, for | s | > 

k > o, 


(9 1) 


[*(?). 


where Q u is a polynomial in • ■ , \k W y x nj- N° w I ^»( s ) 1 _ ^(3i) 

when | s | < X; because of analyticity, the same must be true of | kn '( s ) \ . 
Putn = N{s) m (9,1), A (s) being the integer defined above. Sine ek^ifls/x”)] = 
0(q\) = 0((1/ | s l 7 )),remark (a) follows. 

(b) B (s) is clearly > 0 when s > 0, hence M (s) > 0 when s < 0. Since B (0) = 
0, B(s) ^ 0 foi sufficiently small s ^ 0; since f{s) —»■ 0 as | s | —► °°, M(s) ^ 0 
for | s | sufficiently large, since M(sx) = M(s), remark (b) follows. 

(c) If 7 = co, i.e., qi = 0, then kjs) goes to zero with great rapidity as n -> °°, 
if | s | < 1. The general line of argument is clear 

(B) Let k(s) he a polynomial of degree d > 1 with real coefficients, k{s) — 
q 0 — j- . . -\- q d s d , with'anon-negative double point, k(a) = a > 0, and such that 

k(s) > s when s > a Let f(s) he any solution of the functional equation finis) = 
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ms)] which is continuous for s > 0 and satisfies <p(s) > a for s > 0; here m is any 
number > 1. Then theorem 3.4 holds, mill x replaced by m. 

It is not difficult to show that if a < Si < s < s 2 , lim lefis) = oo uniformly in s 

Hence f(s) Write If(s) = log (3 + — Z)hd-.s~A Then d~ n . 

\ 2rf;-i / 

log f(stn n ) = d~ n log M>A(s)] = (1 - tT") log q d /(d - 1) + log fi(s) + £ 

s being taken large enough so that R(k 3 ^[\P(s)]) is continuous 1 
Thus, since the functions R(k^(s)]) are bounded, the functions <T” log f(sm n ) 
converge uniformly, for s sufficiently large, to a continuous function L*(a) satis¬ 
fying L*(ms) = dL*(s). Let L(s) = t p L*(s), where p = log m d. Theorem 3 4 
now follows jiy an argument similar to that used to conclude theorem 3.3. 

(Note that 2 <r i R(k ] ^ 1 [\f/ (s)]) = 0(d ~ n )) 

n+1 

(C) In order to avoid negative signs we work with the Laplace transform in¬ 
stead of the m.g.f. 


n W 

Let H(u) be nondecrcasmg on (0, «>) with 1/(0) = 0; let 4>(s) = / e~‘ u dH(u) 

Jo 

be finite for s > 0. Suppose ^(s) = + o as s 0(5 , where 0 < y < oo j 

Af(s) is continuous and satisfies M(sx ) = ilf(s) /or a > 0, s being some number 


> 1. 77im 


slim [ 

U-.IH- J u 


m dv =~- 1 - 

* r(r +1) 


f x M(v) 

J i t' 




Following the lines of the proof of Karamata’s theorem, we see that for any 

V > 0, [ s 7_1 T(s) ds = D + o(l) as s ^ =o where D = fds; i.e. TV" 1 ■ 

Jx s ’ ’ 

f _ ^ nXy _oo 

rfs / e *“ fH/(u) = D + o(l), or replacing s by (n + l)s, / s T_1 ds [ e~‘ u e~ n,u ■ 

dH(u) = D/(n + 1) T + o(l) = Y(yf a e >e 1 ds + o(l). It follows as in 

[f^]j PP- 189—192, that if F(ii) is any function of bounded variation m (0, 1) we 
have 


J r *v _ 

s'- l ds / e-°” F(e-’*) dll(u) - [ e~’ds. 

u r( T ) Jo 

Let F(e ") = e s if 0 < s < 1 and 0 otherwise. Then the theorem follows from 
(9 2). ^ 

(D) Theorem 4.3 is true if F(u) is any bounded monotone increasing function. 
For simplicity we assume that F(l) = 0; it is readily seen that this causes no 
loss in generality. The proof is given for the case 1 < p < oo • it will be clear 
that p = 1 implies Q = <», while if p = oo (or if f (s) is not entire) Q = 1 
Suppose m and n are positive integers such that m/n < p/(p - 1). Then 

(9.3) f exp („"'*) dFM = if f „<—> iF(u) < n i Ifc + iW' 

1 wjftu r=o ? 1 Ji r _o (m)! 
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t a) ( 0 ) 

where ci = —yy— > interchange of integration and summation are justified by 


fc' 


n 


the positiveness of all terms involved. Suppose 0 < e — 


m 


for k 


sufficiently large the inequality c k < ; s satisfied; see [18], p 253. 

Hence using Stirling’s formula, we see that the last series in (9.3) is dominated 
by a series whose rth term, for r sufficiently large, is controlled by the factor 


nn(l—(i/p)+«— (nfm)) 


Since 1 — - +e — — is negative, the series, and hence the 
p m 


integral, converges. We have thus proved ^ -f- - <1. 

Q P 

m —1 06 

Conveisely, suppose - > — - . Let £(s) = E fr(s), where fr(s) = E c w-™ - 
n p — 1 t=o t-o 

s*' +rn , k = 0, 1, • • • , m — 1. At least one of the functions £*(s) must be of order 
p We suppose that £ 0 (s) is, if not the argument would need only slight modi¬ 
fications We have 


(9.4) 



exp (u mln ) dF(u ) > n 


co 


E 


(m) 1 c m 
[(r + l)n ]!' 


Suppose 0 < e < 1 — From [18], p 253, the inequality c rm > (m) _rm(1/p+0 

rr pm 

must hold for infinitely many values of r. As in the first half of the proof this 
shows that the series and the integral in (9.4) diverge. Thus " + q > 1 and 
the proof is complete 

If p is rational, the conjecture following theorem 4 3 can be proved in a similar 
manner making use of a relation between the class of an entire function and the 
coefficients of its series expansion; see [14], p. 95 
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MOST POWERFUL TESTS OF COMPOSITE HYPOTHESES. I. NORMAL 

DISTRIBUTIONS 

By E. L Lehmann and C. Stein 
University of California, Berkeley 

Summary. Eor testing a composite hypothesis, critical regions are deter¬ 
mined which arc most powerful against a particular alternative at a given level 
of significance. Here a region is said to have level of significance e if the proba¬ 
bility of the region under the hypothesis tested is bounded above by e. These 
problems have been considered by Neyman, Pearson and others, subject to the 
condition that the critical region be similar. In testing the hypothesis specify¬ 
ing the value of the variance of a normal distribution with unknown mean against 
an alternative with larger variance, and m some other problems, the best similar 
region is also most powerful in the sense of this paper. However, in the analo¬ 
gous problem when the variance under the alternative hypothesis is less than 
that under the hypothesis tested, in the case of Student’s hypothesis when the 
level of significance is less than and in some other cases, the best similar region 
is not most powerful in the sense of this paper. There exist most powerful tests 
which are quite good against certain alternatives in some cases where no proper 
similar region exists. These results indicate that in some practical cases the 
standard test is not best if the class of alternatives is sufficiently restricted 

1. Introduction. The problem to be discussed in this paper is that of testing 
a composite hypothesis against a simple alternative. More specifically let 7 = 
{/} be a family of probability density functions defined over a Euclidean space R n 
and let g be a probability density function not in 7. We wish to test the hypoth¬ 
esis Ho that the random variable X = (X\ , • • • , X n ) is distributed accoiding 
to a density / of 7 against the alternative Hi that X is distributed according to 
g By a test we mean a region of rejection, w in R n . 

Neyman and Pearson, in the fundamental paper [1] which laid the groundwork 
of the theory of optimum tests, restricted their considerations to similar regions. 
They considered a region (set) w to be optimum for the given level of significance 
«if it maximizes the power 

( 1 ) [ g(x) dx 

"to 

subject to the restriction 

(2) f f(x) dx = e for all / in 7. 

Jw 

As Neyman, Wald and others have pointed out, it is more natural to replace 
the condition of similarity (2) by the weaker restriction 

(3) [ f(x) dx < « for all / in 7. 

J W 

495 



496 


E. L. LEHMANN AND C STEIN 


A region u> maximizing (1) subject to (3) is called most powerful against the alter¬ 
native g at the level of significance Here and throughout the paper, all func¬ 
tions and sets are assumed to be Borel measuiable. 

In the present paper we shall consider certain composite hypotheses, and derive 
tests foi them which are most powerful against a simple alternative, For the 
cases in which these tests coincide with the standard similar regions it will thus 
be established that no further increase in power is possible with tests of fixed 
sample sizes In the more usual situation where the most powerful test depends 
strongly on the specific alternative chosen, no such absolute justification of the 
standard test is possible. In these cases, any justification must take account 
of the fact that it is desired to obtain good power against a large class of alterna¬ 
tives. This can be done, for instance, by using Wald’s definition of a most strin¬ 
gent test [2] or his concept of minimizing the maximum lisk. 1 If, on the other 
hand, the class of alternatives is sufficiently restricted, the results of the present 
paper indicate that for small samples there may exist a test which is appreciably 
better than the standard test. 

Frequently the probability of an error of the first kind is an analytic function 
of a nuisance parameter for every choice of critical region. Hence, if it is known 
that some nuisance parameter d lies, say in a certain finite interval I, then any 
test which is similar for 6ml will be similar for all B Consequently, the knowl¬ 
edge concerning d cannot be used to find a more powerful test. On the other 
hand, as is indicated at the end of section 5, restrictions of the nuisance parame¬ 
ters may, for small samples, lead to considerably more powerful tests if the con¬ 
dition of similarity is replaced by the weaker condition (3). 

There is one class of problems to which it may be desirable to apply the method 
of the present paper regardless of sample size; namely, if no similar region exists. 
Suppose, for instance, that Xi , ■ • • , X n are known to be normally and inde¬ 
pendently distributed, X, having unknown mean and variance £, and a\ for i = 
1 , • • , n. For testing the hypothesis 

Ho : cr, = 1, (z = 1, • • • , n) 

no similar region exists, while it is easy to see that against any simple alternative 

Hi <?i — (rn < 1, £, = £,i, 

there exists a test which satisfies condition (3) and which has good power against 
Hi provided the <r,i are sufficiently small. 

The present first part of this paper is restricted to hypotheses concerning 
normal distributions. It is intended to extend the considerations to exponential 

1 In an unpublished paper, it is shown by G. Hunt and C. Stein that the traditional test 
is most stringent in several cases, including the (univariate) linear hypothesis and the 
hypothesis specifying the ratio of the variances of two normal distributions These results 
can be extended to analogous problems for distributions other than the normal, and similar 
results can be proved regarding minimization of the maximum risk if the weight function 
has a certain type of symmetry 
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and rectangular distributions, to consider non-parametric problems and pos¬ 
sibly also more complicated problems connected with normal distributions, in 
later parts of the paper. 


2. Sufficient conditions for a most powerful test. The method which will be 
used in this paper to obtain most powerful tests is an adaptation of the funda¬ 
mental lemma of Neyman and Pearson [1], At the same time it is essentially 
a special case of much more general results of Wald [3, 4], although the exact 
conditions of Wald’s investigation are not satisfied in most of our problems. 

Let h and (j be two functions defined over R n , let L be a constant and let w 
be a region in R n such that 


g(x) > h h(x) in u>; 


(4) 


g(x) < k h(x) in R„ — w. 
Then if w' is such that 


(5) 


/ h(x) dx < / h[x) dx, 

*1 tfl ' * ip 


it follows as in the fundamental lemma where in (5) equality is assumed instead 
of inequality, that 


( 0 ) 




Throughout the present paper we shall be concerned with the special case in 
which 7 is an s-parameter family. We may denote the membeis of 7 by f e and 
we shall obtain all membeis of 7 as 6 ranges over a set u> m an s-dimensional Eu¬ 
clidean space. In the theorem which we shall now state, we shall be concerned 
with point functions X defined over w. We shall assume that X = c\i where c 
is a positive constant and p a cumulative distribution function.* Also we sup¬ 
pose that fe(x) is a measurable function of x and d jointly. However, the theo¬ 
rem is also valid if u is an abstract space and X a (finite) non-negative additive 
set function (measure) over w. Such more general interpretation may be re¬ 
quired when applying the theory to non-parametric problems. 

Theorem 1. Let Ho be the hypothesis that the random variable X is distributed 
according to a density Junction fo with d in w, and let Hi denote the alternative that X 
is distributed according to a density g. Let X be a Junction dejined over w and such 
that 


(7) 


X = c\i, 


2 The introduction of the distribution n is simply a mathematical device and does not 
imply that 9 is a random vaiiable (see Wald [16] p 282) 
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where, c is a positive constant and n a cumulative distribution function. Let k be a 
constant and Id w be a region m li n such that 


(8) 


g(x) > k / /»( x) d\{0) m w, 

v wj 

(fix) < k f fo(x) d\(d) in R n 


w. 


Suppose that w is of level of significance e for testing II o against Hi , that is that 

(9) [ f$(x) dx < i for all Q in w, 
and suppose that the subset of u for which 

(10) J ffix) dx < t 

has \-measure zero. Then w is most powerful for testing Ho against H\ at level of 
significance e. 

Proof. Without loss of generality we shall assume c = 1. Let w' bo any 
test of level of significance t Then 


(11) / ffix) dx < e for all 0 in w, 

"W* i 

and because of (7) 

(12) f If ffix) dx\dX(d) < e f d\(0) = e. 

J CJ \ V U) f J j it) 

Since X is of bounded variation we may interchange the order of integration in 

(12) and obtain 

(13) f h(x) dx < e, 

J u> f 

where 

(14) h(x) = f f e (x) d\(6). 

J w 

From (9) and the condition surrounding (10) it follows that 

(15) J ffix)dxjdX(d) = t, 

and therefore that 


(16) f h(x) dx — e. 

J iu 

Thus w and w' satisfy conditions (4) and (5), and hence also (6) which completes 
the proof. 
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It is useful to notice that, the assumptions of theorem ] will be satisfied pro¬ 
vided 

( /»( x) d% 

J V) 

attains its maximum e at all points of increase of X, and therefore in particular 
whenever w is a similar region of size e. 

We shall m many problems exhibit a function X which satisfies the conditions 
of theorem 1 without giving the reasons which led us to this function However 
the followmg comments concerning the tentative process that we used, may be 
helpful. One may first examine the known most powerful similar region. If 
there exists a cumulative distribution function X such that (8) is the most power¬ 
ful similar region, the problem is solved. If the most powerful similar region 
cannot even be approximated by (8) with a sequence of X’s, it is reasonable to 
conclude that the most powerful test is not .similar Because the probability 
(under the null hypothesis) of any test is in all the problems considered here an 
analytic function of the parameter, this implies that the probability (under the 
null hypothesis) of the most powerful test attains its maximum at an at most 
denumerable (in some cases finite) set of points In all the cases of this land 
which we considered in the present pait I, it was then possible to prove the 
existence of a function X with a single point of increase, which satisfied the condi¬ 
tions of theorem 1 

A theorem analogous to theorem 1 holds for most powerful similar regions. 
Let Ho and Hi be as before and let X be a function of bounded variation not 
necessarily non-decreasing. Let w be a region in R„ such that 


g(x) > k / fe(x) d\(d) in w; 

J U 

g(x) < k [ fi(x) d\{d) m R n - 

J Id 


Let ia be a similar region of level of significance e for testing H 0 against Hi , that 
is, let 

(18) L fe(x) dx = € for all 6 in w, 

then w is a most powerful similar region for testing Ho against Hi 

For all the problems considered m this paper we shall prove the existence of 
functions X satisfying the conditions of theorem 1, but we have not investigated 
the corresponding existence problem in general. On the other hand one verifies 
easily that for many of the cases treated here in which the most powerful test is 
not similar, the method for obtaining most powerful similar regions does not 
apply. However, for all the problems considered in the present paper the most 
powerful similar tests can be obtained easily by other methods [1, 5, 6, 7, 8]. 
For most of the problems the corresponding derivations have been carried out 
m the literature. 
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Although we restrict ourselves in the present paper to the problem of maximiz¬ 
ing the power at a single alternative, theorem 1 clearly also applies to the more 
general problem of maximizing the average power over surfaces in a space of 
alternatives, Such problems have been considered from the point of view of 
similar legions by Wald, Hsu and others [9,10,11] 


3. Testing the values of one or several variances. Let X x , • • ■ , X n be a sample 
from a normal population with mean £ and variance a, both unknown. We 
want to test the hypothesis Ho that <r = <ro against the simple alternative that 
a = vi, £ = £i. We shall show that the most powerful test for Ho against II x 
is 

(19) 2(x, - £i) 2 < k when < cr 0 , 

(20) 2(re, — xf > c when a x > a 0 , 


where k and c arc determined by the level of significance. Thus the best similar 
region is most powerful if the variance under the alternative is greater than that 
under the null hypothesis, while the most powerful tests against the other alter¬ 
natives are not similar. That the region 2(xi — x)' > c (< o') is most powerful 
of all similar regions against a x > <r 0 (a, < o- Q ) was shown by Neyman and Pear¬ 
son [1J 

We consider first the case <r t < <r u , and apply theorem 1 with X a stepfunction 
having a single jump at £i, that is, 


( 21 ) 


X(S) = 


0 if £ <£i; 
1 if £>£i. 


The region w given by (8) thus becomes 


( 22 ) 

which is equivalent to 


exp 

exp 


-^| 2 ( a : t -£ 1 ) 2 


■i > w, 


(23) 


2(a u - £i) 2 < k, 


since <ri < <to . The size of the region (23), that is, its probability under the null 
hypothesis is a function of £ and clearly attains its maximum when £ = £i. Thus 
all conditions of theorem 1 are satisfied provided wo choose k so that the maxi¬ 
mum size of (23) equals e. 

Before considering the case <ri > a „ wo state for later reference the following: 

Lemma 1. Jj <ri > <ro there exists an absolutely continuous non-decreasing junc¬ 
tion X of bounded variation such that 


1 


1 



UV/A 


This follows immediately fiom the well known representation of exp i 

as a Laplace transform by applying a translation, and is easily verified directly 
by substituting 


(25) 


X'(£) = exp 


L 2(< 7 1 - a-5) 


(£ - £i) 2 


Now let in > o-o and n > 1. The region w given by (8) can be expressed in the 
form 


exp 

2 2(a;, - x) 2 
Za 1 

exp 

1 



exp 

2(s» - xf 

Zero _ 

j 

* 

1 - 1 

-~l (X - £) 2 
Zc 0 _ 

dm 


By lemma 1 there exists an absolutely continuous function X for which the second 
factor is constant. For this X (26) is equivalent to 

(27) 2(». - a-) 2 > c, 

and since this is a similar region, the conditions of theorem 1 are satisfied pro¬ 
vided c is chosen so as to give the correct level of significance. 

We next consider the problem in which the random variables X,(i = 1, • • , n) 
are independently normally distributed with unknown means £, and unknown 
variances <r, . We wish to test the hypothesis H 0 : a, = < 7,0 for t = 1, • • • , n 
against the alternative Hi. <r. - «r,i, £, = £,1 . Feller [12] showed that there 
exist no similar regions for this problem. However, as we shall show now, when 
the critical regions are not required to be similar, non-tnvial tests against Hi 
do exist provided <r,i < <r,o for at least one value of 1 . 

Let us assume without loss of geneiality that <r*i < <7,0 for t — 1, j m > 
< 7,1 > ( 7 ,o for i = m 4- 1, • , n where n - m may be zero but where for the 

moment we shall assume m > 0 With X(£i, 1 > £») = ^*(^>)> the region 


(8) becomes 

m 

n 

«-i 

(28) 


exp 

~wr (x, - Uf 
Z<7% 1 J 


f eXp 

J—» 


dX,(£.) 


■ n ■ 

1 


exp 


(X) - £,i) 2 

jl ______ 


1 

- 1 , 

exp | 

J—oo 

-rr(i j ~ £>) 

J 


- > 7c. 


d\ ,(€,) 


For X (1 = 1 - m) we take step functions with a single jump at £.1 , while 

for the remaining X’s we choose the absolutely continuous functions which make 
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the second factor constant and whose existence is guaranteed by lemma 1, The 
region (28) thus reduces to 

(29) £ ( ‘2 2 ) ('-v, — fa) 2 < c. 

.-I \cr tl <r,o/ 

Since the probability of the region (29) is independent of f,„ + i, • • ■ , £„ and with 
varying & , ■ • • , takes on its maximum when £, = £ (l it follows from theorem 
1 that this region is most powerful for testing H a against H,. 

We still have to consider the case m - 0, that is, the case in which o- (1 > 
for all i. To treat this problem wo adjoin to the variables Xi , ■ ■ • , X„ a random 
variable Y uniformly distributed between 0 and 1, that is, essentially a table of 
random numbers. In the space of n + 1 random variables we determine a region 

T1 

w according to (8), letting X(£i , ■ • , |„) = n X,(£.) and choosing the X’s so 

as to make the left hand side of (8) equal to the right hand side. This is possible 
by lemma 1 and with this choice of the X’s the inequalities (9) become 


(30) 


k > k in w, 


k < k in R n +i — w, 


and hence they impose no restrictions on iv. Thus any similar region of the cor¬ 
rect size will satisfy the conditions of theorem 1. It follows that the region 

(31) w.0<y<e, 


being a similar region of size c, is most powerful. This result means that we do 
not use the observations ,•••,*# at all but consult a table of random num¬ 
bers. 

The situation just described occurs in other problems to which the same 
method of proof can be applied. It is therefore convenient for later reference to 
formulate the following 

Theoeem 2. Let ILj be the hypothesis that the random variable X is distributed 
according to a probability density function f e with 6 m u, and let Hi denote the alter¬ 
native that X is distributed according to the density junction g. Let Y be a random 
variable known to be unijoimly distributed over the interval |0, 1]. If there exists a 
real valued function X satisfying (7) for which 

(32) g(x) = it //«(*) d\(9), 

J u 

then the critical region 0 < y < eis most powerful for testing Ha against Hi at level 
of significance t. 


4. Testing equality of variances and the value of the circular serial correlation 
coefficient. For each i — 1, ■ ■ • , m let Xy(j = 1, • • ■ , n,) be a sample from a 
normal distribution with E(X U ) = £, and E(X x , - £,) 2 = o\ . We are con- 
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corned v ith the hypothesis H a that or = cr 2 = • = a m , where first we shall 

assume the | s to be known, so that without loss of generality we may assume 
them, equal to 0. The alternative hypothesis specifies a l = a,i, i = 1 ■ • m. 
Let or* denote the unknown common variance under H 0 and let X(<r) be a step 
function with a single jump at a point on to be determined later With 

«* /(To \ n ‘ 

= II (—) > the test (8) takes on the form 

\a t i/ 


(33) 

or equivalently 

(34) 



Since the function on the left hand side is homogeneous of degree 0 in the ai’s, 
this is a similar region and the conditions of theorem 1 are therefore satisfied 
provided the region has the correct size. This can be achieved for any level 
of significance e by proper choice of al. 

As stated earlier, the conditions of theorem 1 imply that the size of the critical 
region is equal to e at all points of increase of X. As a consequence, if the size 
equals e at only a finite number of points of to, X must be a step function. Also 
if each point of a certain interval is a point of increase of X, the critical region 
must be similar over that interval (and, if the functions involved are analytic, 
the region must be similar over u ). However, the last problem shows that the 
converse of neither of these two statements is correct. For the region (34) 
is a similar region although the corresponding X has only a single point of increase. 

Next we consider the hypothesis of equality of variances without assuming the 
means to be known For the case m = 2 the most powerful similar region was 
obtained by Neyman and Pearson [1] We assume first that n, > 1 for all i, 

m 

and we take X(tr, fc, • • ■ , f m ) = X 0 (o-)IJx,(f,), with X 0 (<x) as before a step func- 

1=1 

tion with a single jump at a point <r 0 to be determined later Suppose now that 
vo > v.i for i = 1, • • , s; <r 0 < an for i = s + 1, • • • , in, an < a 2 i < 
where 0 < s < m and s depends on <r 0 . Then define 


(35) 


JO if £< < £.i ; 
1 1 if {. > ?.i 


for i = 1, • , s and use lemma 1 for i = s + 1, ■ • ■ ,m 

For proper choice of k the critical region will then be determined by the in¬ 
equality 
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(35) 




(■r,j - ■!',)' 




~ U f > 0 


The probability of this region computed undei //„, is independent of £„ +1 , , 

£„ and for any a attains its maximum when £. = £,i (i = 1, ■ ■ ■ , s). Since the 
probability of the region is independent of cr when £, = £,i for i = 1, • , s, 
the conditions of theorem 1 ure again established That for £, = the size of 
(36) goes continuously from 0 to 1 with decreasing o- B is easily checked since at 
the only doubtful points cr u = <r lt (where the value of s changes), the correspond¬ 
ing coefficient i — 2 passes thiough 0. 

O"0 Cil 

We still have to consider the case that some of the n, are equal to 1 If m, = 1 
for some i < s there is no change whatever, while if n, = 1 for some i > s, 
the corresponding term m (36) vanishes It follows easily that if m > 1 for at 
least one value of i > f the solution (36) is valid. On the other hand, if m = 1 
for all i > 1, we can apply theorem 2 by taking n = <r,i, Xi(fi) as a step function 
with a single jump at and the remaining X,(0) according to lemma 1. It thus 
follows that for this problem no non-trivial test exists. 

The following problem can be reduced to the hypothesis of equality of vari¬ 
ances with means assumed known: Under the null hypothesis Xi , • ■ ■ , X„ have 


a joint multivariate normal distribution with density C exp 





0>ij\XiX j 


where the a’s are known and where <r is an unknown scale factor. Under Hi 
the Z’s have a joint multivariate normal distribution with density C exp 



~Xb % jX,x, 


A number of hypotheses specifying the value of one or several 


correlation coefficients have this form. The most powerful test of Ho against 
Hi is given by 


2a{,x,Xj 

as is easily shown by applying a non-singular linear transformation which re¬ 
duces 2bijX,x, to diagonal form and to a sum of squares, or by applying 

directly the method of proof of the earlier problem. 

A corresponding reduction when the Z’s have a common but unknown mean is 
usually impossible. One problem of this kind for which the solution is simple is 
the hypothesis specifying the value of a serial correlation coefficient in a circular 
population. The most powerful similar region for testing this hypothesis was 
obtained in [7] Consider the probability density function 


C exp 


X) (ah ~ 0 - 5(.r,+i - £) j , 


(z„ +1 = xi), | 5 | < 1, 


(37) 
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and let Ha speedy 5 — 5 0 while Ih assigns to the parameters the values on , fi, h 
Then the most powerful test of Hq against Hi is 


(38) 


2(.t, - a;)(x l+ i — x) ^ T ' 
3(s, - a) 2 


if h > S a , 


2_(a t - £i)(a\+i — £i) ^ ,, , 

2(t, £l) 2 


h < 5o 


We shall omit the proof of this result, since the method is the same as in the other 
problems considered in this section. 


5. Student’s hypothesis and some generalizations. As the principal result of 
the present section we shall prove that for testing Student’s hypothesis against a 
simple alternative the most powerful test is a non-similar region of the form 

(39) S(X, - r,f < fc, 

if the level of significance £ is less than or equal to § Here i; and fc depend on 
€ and on the alternative, and they will not be determined explicitly. It will be 
shown also that if t is greater than or equal to Student’s test is most powerful. 
These results will be extended rather easily to the general univariate lineal 
hypothesis The corresponding investigation for similar regions was carried 
through for Student’s hypothesis by Neyman and Pearson [1] while the extension 
to a general linear hypothesis is contained in a paper by Hsu [13]. 

The proof of the main result mentioned above is rather lengthy We shall 
begin by proving two lemmas. 

Lemma 2 Let Fi, • , F n be n independent random variables, normally dis¬ 

tributed vnth 0 mean and unit variance, and let 

P{a, k) = P<i, (F, - a ) 2 < (» - ; 

(40) ^ J 

<p(fc) = sup P(a, fc) for 0 < fc < n, 0 < a 

a 

Then for each k there exists a(k ) such that 

(41) P(a(fc), fc) = ¥>(/c). 

Proof. If Z, = Y,/a, (i = 1, • ■ • , n) the Z's are independently normally 
distributed with zero mean and variance 1/ a 2 and (40) may be written as 

(42) P(a,k) = P{2(Z l -l) 2 <«-fc] 

Hence it is seen that for any k, P(a, fc) tends to zero as a tends to either zero or 
infinity. This proves the lemma since for any fc, P{a, fc) is a continuous function 
of o. 

Lemma 3. Given any t, 0 < e < \ there exists fc(e) between zero and n such that 
<p(fc(e)) = «• 
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Proof. The proof will be given in a number of steps 

(i) <fi(k) —> \ as k —> 0 . 

Clearly Pfo, k) never exceeds ’. The result will therefore follow if we exhibit 
a sequence a k such that P(a k , ft) -» J as * -» 0. Let a,. = l/Vft Then 

(43) , A') = P{ V* 2K? - 22 F, + vT < 0 J. 

The right hand side is a continuous function of k and therefore tends to 
(*4) 7>{2F, > 0} = j, 

as k tends to zero. 

(ii) <p(k) —y 0 as lc —y n. 

Consider P(a, k) as in (42). Written as an integral of the probability density 
of the Z s, the region of integration is independent of a and its volume tends to 
0 as l tends to n. On the other hand the probability density depends on a 

result fdlow 7 tK ^ ° Ver the regi ° n ° f integmtion if k > 0 . and hence the 

(in) If 0 < ft., P(a, k) tends to zero uniformly for k in the interval L < k < r 
aa a tends to zero or infinity. 

This follows from the fact that 0 < P(a, b) < P(a, k 0 ) since P(a, /c„) tends to 
0 as a tends to zero or infinity. 

(iv) Given h and h there exist numbers a 0 and a, with 0 <«<,<«,< « 
such that 0 < A'o < ft < ft t < n implies a. < a{k) < a, , 

If this were not true there would exist a sequence fc (,) with ft. < ft (<) < ft, and 
«(ft •) tending to infinity or zero. Then y(a(ft (l ')) would tend to zero by (hi) 
On the other hand consider P(l, ft) for ft. < ft < ft,. This is a continuous non- 
vanishing function of ft and hence attains its lower bound m for some ft in ft. < 
k f kl • Therefore m is positive and we have a contradiction 

valllftT any ^ ' h With ° < ^ < h < is continuous on ^e mter- 

To see this, select a. and a x in accordance with (iv). Then P(a, ft) is uniformly 
cantinuous in the rectangle a, < a < a., ft. < ft < ft.. Given r, > 0 let 5 be 
such that | ft - ft <5 implies j P(a, ft') - P(a ft") I < „ Then <*(¥) > 

/W), AO > PKft") ft") - : _ ,< kndVyVnrmet:;7(rj > 

vyk ) which establishes the continuity of <p. 

The proof of the lemma is now immediate. For let 0 < e < -J. It follows 
from (i) and (n) that there exist ft. and 7 c. such that 

p(ft 0 ) < «/ 2 , *>(&.) > « + Jfl - e), 

and hence by (v) there exists ft(e) for which v>(k(e)) = e. 

Let us now consider Student’s hypothesis, The random variables ,Y., • • • , X„ 
are a sample from a normal distribution which under 77 . has mean 0 and un¬ 
known variance «■, while under 77. the mean is £. and the variance d Without 

w? g T y W ° S ! mU aSSUmc ?1 > Applying theorem 1 with X a step- 
. !° n ^. in ® a Slr ‘ 6 fc jump at a point u a > <r. to be determined later, we ob¬ 

tain the critical region m the form 
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( 45 ) (A - -) ZXl ~ 2 ^ 2X, < C 

\(Tl (To/ O'! 

Let 7, = X,/a so that under Ii 0 the F’s are distributed with zero mean and unit 
variance. Then (45) becomes 


(46) 


2H - 2 — 

o-(l 



27, 


a 1 


which may be written as 


(47) 2(7, - a) 2 < (n - kW, 

where 



As a varies from 0 to <*, a goes from » to 0. Let P(a, h), <p(h) and a(ft) be 
defined as in lemma 2. Given the level of significance e (0 < e < ^), let k* 
and a* be determined according to lemma 2 and 3 so that 

(49) <p(k*) = e and P(o*, k*) = <p{k*). 

We now select <r a > or and c so that 


(50) 


a* = 


(1 — or/o-oVo 


and k* 


-|H) 


We have to show that for this choice of oo and c the size of the critical region at¬ 
tains its maximum when a = <r a and that this maximum size is e, Substituting 
from (50) we express the region (47) in the form 

(51) • 2^7,-^ a*J < (» - ft*) a* 2 . 


Thus the probability of the region is 



As a varies, (52) attains its maximum when —a* = a(fc*) = a*, that is, when 

cr 

<r = o o and the maximum value of (52) is <p(ft*) = e. 

T his derivation is valid even when n = 1, i.e., when the hypothesis £ = 0 is 
to be tested by observing only a single random variable X, known to be nor¬ 
mally distributed but whose mean £ and variance are unknown. For this prob¬ 
lem no similar region exists. However, critical regions of the form 0 < £i — a < 
x < £ x + b will give any level of significance < ^ for proper choice of o and b, 
while the power of such regions will tend to 1 as or tends to 0 Therefore, the 
power of the most powerful test will be close to 1 if or is sufficiently small. 
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Having completed the discussion of the east* e < \ let us next suppose that 
( > We shall need the following 

Lemma 4. Let c and eti be jiosihve constants Then there exists a function / 
such that f (a) = 0 when a < a>i and such that for all w > 0 

(53) fV“"/(a) da = 

Jo 

This follows from the well known representation of as a Laplace trans¬ 
form by applying a translation, (53) can be checked directly by substituting 

-(r* /!(«-«,)) 

(54) /(«) - (a _'---)i/ 2 Ior “ ^ ai ‘ 


Applying theorem i to Student’s hypothesis, where again we shall assume ft 
to be positive, for proper choice of h we obtain from (9) 


(55) 


exp 


— q 2 2X1 + _2 2bY, 


*1 


r r i vv2 ] i 

/ exp - „ ZXi - 7 

Jo to. L 2<r J o* n 


> 1. 


dh(tr) 


It follows from lemma 4 that for any positive c there exists a non-dccreasmg 
function X of bounded variation with \(cr) constant for o- > <ri, such that 


(56) 


r 

r i 

/ ex p 
Jo 

v Y" 

L w 


^ ^ d\(<r) = exp - 2 \ 2X? - c Vj . 


For this choice of X, (55) reduces to 

> exp [—fi V-t?], 

and hence to 


(57) 


exp 


li y., 


(58) 




Vs*! - 


This is a similar region and therefore most powerful for testing Student’s hy¬ 
pothesis against Hi. By adjusting c, the size of the i egion can be made equal 
to any e > 

The argument for t > \ must bo modified slightly in the case n = 1, that is, 
when we want to test Student’s hypothesis on the basis of a single observation. 
Let us adjoin to the variable X a random variable Y known to be uniformly 
distributed over the interval [0, 1], Using the same X and k as before, (58) 
becomes 


(59) 


x 

x 


> c' 
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For d = -1 the critical region includes all points (*, y) for which x is positive 
while (59) places no restriction on which of the remaining points to include in 
the critical region The similar region 

(60) x > 0, x < 0, 0 < y < 2(e — i) 


therefore satisfies all conditions of theorem 1 and hence is most powerful 

In extending these results to the general linear hypothesis, we shall assume the 
hypothesis reduced to canonical form [14, 15], We shall therefoie assume that 
Xi, , X n are normally distributed with common vaiiance which is unknown 

under H a and has the value + under H.y . Furtheimoro, under So , E(X t ) = 0 
for i = 1, • , s, s + 1, • , m, E(X t ) unknown foi i = m + 1, • ■ >n while 

under Hi E(X t ) = 0 for i = s + 1, , to, E(X,) = for the remaining values 

of i. 

For e < ^ we shall consider critical regions of the form 


(61) 


ex p j 

( i 

L 2a! 

Z (%i — tiiT + Z %! + Z (x t — ^i) 2 "]l 

— i™! t™a+l m+1 J 

exp | — 

l 

2a! 

£ m n -i' 

Z at + Z xl + Z (®, — £ t i ) 2 

1 t = 3+l Z—7H+1 _ 

} 


>K 


which are obtained from (8) by substituting for X a step-function with a single 
jump at the parameter point (<r, , fa +1 , i, • , i). Making an orthonormal 


transformation from x ,, 


, X a to yi , • • • , yi such that yi = .-i ^ l ' Xl and 

Vsffi 

letting ?/, = x , for i = s + 1, , to, ij, = x, - fa for i = to + 1, • ■ , n, 

(61) reduces to 


ex P j 

r l 

l _ 2 a! 

{'Ey! - 2yi /j/z & | 

ex P j 

r l n ^ 

1 2' 

- ™ 

{ A<J o X^l J 



(62) 


For tr 0 > <ri we? can rewrite (62) as 

n 1 

( 03 ) ~ - 


> c. 


Z y\ < 

z=m -\-1 


2 2 
c\ cro 


_ C\ _] i—l 


and we see that under H 0 for any <r the size of this region considered as a function 
of the unknown means of F m+ i, • ■ , F„ takes on its maximum when these 
means are zero, 1 e. when £. = fa for i = m + 1, ■ , n. For these maximizing 
values of the means the existence of a suitable vo and c follows from the corre¬ 
sponding result m connection with Student’s hypothesis 

Thus the most powerful test for testing II o against Hi at level of significance 
e=| has the form 




+ Z x ! + Z (-^i — (hi)" < c. 

4 = 3+1 1 = 171+1 


(64) 
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It is interesting that the variables X;(i = in +- 1, • ■ , n) which may be dis¬ 
carded when considerations are restricted to similar regions [18], do contribute 
to the power when similarity is not required. The same phenomenon also 
occurs m certain, problems considered earlier in this paper. 

For the case « > l, let us take 

71 

(65) A(<r, W , ■ ■ • , in) = A(<r) II A,(f, I tr). 


We shall select \(<r) such that \(<r) is constant when a > cry. Hence it is enough 
to define A,(£, | tr) for a < oj.. For any tr < vi there exists by lemma 1 a func¬ 
tion A,(£, | tr) such that 


( 66 ) 


f “ r i , n [i 

J exp - ^ (a,\ - S,y \a) = k exp j - ^ (x, - £ a ) ! 


For tliis choice of the A,, (9) becomes 
(67) 


exp l - ’A |~X) & ~ f*t) 2 + Z) 

_^ _^0*1 Lt°*1___ 1“S+1 J ^ jj 

I, csp (- is*’}^ M 

Next we chose \{a) according to lemma 4 such that 


(08) 


thus, by proper choice of k', reducing (67) to 


(69) 


]C£u 3,’» 




> — c. 


The piobability of this region under Ha is independent of £ m +i, • • • , £„ and <r, 
and hence (69) is most poweiful for testing Ha against II i 

Let us return once more to the problem of testing Student’s hypothesis against 
a simple alternative £ = , a = 1 and let us assume as known that <r < 1. No 

use can be made of this knowledge if consideration is restricted to similar regions. 
For the probability of first kind error is an analytic function of cr, and conse¬ 
quently, if a test is similar with respect to all values of a which are < 1, it is simi¬ 
lar with respect to all values of <r. Let us now consider this problem without the 
restriction of similarity If € > the knowledge concerning a does not enable 
us to find a test which is more powerful than that given by (58), since the func¬ 
tion A(<r) on which (58) was based had all its points of increase for <r < 1. 

On the other hand we may expect improvement for e < f since the most 
powerful test in this case was based on a function A with a single point of increase 
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Co > 1 which is ho longer admitted as a possible value of c. If, instead, we take 
for X the step function with a single jump at <r = 1 we obtain the critical region 


(70) 


exp [ - £ £ (s, - £ t ) 2 ] 
exp [ - \ X) 3*1 ’ 


which is equivalent to 


(71) 


x > c. 


Here c > 0 since € < 1, and therefore, when £ = 0 the probability of (71) is an 
increasing function of a and hence takes on its maximum at <r = 1 It follows 
from theorem 1 that (71) is most powerful under the conditions stated. 

In the opposite problem m which it is known that a > 1, the situation is 
reversed. For e < § no improvement over (45) is possible while for e > j wo 
can use for X the step function with a single step at <r = 1 thus obtaining the 
critical region (70) but this time with c < 0. When £ = 0 the probability of 
this region is a decreasing function of a and it follows that (70) is most powerful 
in this case. 

Similar remarks apply to other problems We mention as one further ex¬ 
ample a modification of the Behrens-Fisher problem Let Xi, ■ ■ ■ , X n and 
Yi, ■ • , Y m be independently normally distributed, the X’s with mean £ and 
variance cr 2 , the F’s with mean v and variance t, all four parameters being un¬ 
known. We wish to test, at level of significance e < the hypothesis £ = r; 
against the simple alternative £ = £i,i? = i 7 i,v = l,r = l, where £i 4= Vi and 
we assume it known that <r < 1, r < 1. Basing the test on a step function X 

71 i - TTlTh 

with a single jump at <r = 1, t = 1, £ = ——:—- we obtain for w the region 

m + n 


exp [ — | 20 (%t — £i) 2 — j S(|A — vi ) 2 ] 


(73) 


exp 


1 v/ n£i 1+ myiY i v („ _ V % 1 + W7 ?A " 

2 ^ \ X ' n + m ) 5 ^ V ‘ n + m ) 


> K 


which is equivalent to 

(74) y — x > c (c > 0), 

if we assume, as we may without loss of generality, that in > £i. When r\ = 

cr 2 t 

£„ f y - X is normally distributed with zero mean and variance - + -. There- 

2 2 

fore the probability of (74) is an increasing function of and hence attains 

its maximum when <y = t = 1. It follows from theorem 1 that the legion 
(74) is most powerful for the problem under consideration. 


6. Admissibility. The general problem to be considered in this paper has 
been formulated m section 1: To obtain a region w 
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(75) maximizing 

subject to the restriction 

(70) f fe(x) dx < e 

J10 

Since for any particular such problem there may exist several essentially different 
regions satisfying these conditions, it may happen that there exists a region w' 


such that 


(77) 

/ 0(e) dx = / g(x) dx, 

J W f wti) 

and 


(78) 

f ft(x) dx < f fo(x) dx for all 6 e a, 

J ID 1 J 11' 


with inequality holding for some 0. Clearly w' is preferable to w, In this case, 
following the definition of Wald [4], we say that w is not admissible. We shall 
rule out this possibility for a large? class of problems by proving 
Theorem 3. If w satisfies the conditions of theorem 1, and if the set of points 
x for which equality holds in (8) has measure zero, then any region satisfying (75) 
and (70) differs from w only on a set of measure zero. 

Proof. Without loss of generality we shall assume X of theorem 1 to be a 
distribution function. Then 

h(x) = f f e (x) d\(6) 

is a completely specified piobability density function, and io is the unique 3 — 
up to a set of measui e zero—most powerful test for testing the simple hypothesis 
H'o'.h against the simple alternative Ih’.g. Suppose now that w' satisfies (75) 
and (76). Then 

(79) [ h(x) dx < t, 

and w 1 is most powerful for testing Ho against Hi . It follows that w' differs 
from w at most by a null set. 

Earlier we enlarged the problem of testing by adjoining to the original random 
variable X a random variable with a known distribution. This is equivalent 
to the following modification of the original problem Instead of defining a test 
to be a critical legion (of rejection) in the space of x, we define it to be a critical 

3 One sees this easily from Neyman and Pearson’s proof of the fundamental lemma [1], 
by using the assumption that the sot of points for which equality holds in (8), has 
measure zero. 


/ g(x) dx 

Jin 


for all 0 e to. 
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function <p (0 < <p(x) < 1 ) which with every point x associates a probability of 
rejection ip(x). If a: is observed, the hypothesis is rejected with probability <p(x) 
accoiding to a table of random numbers In the case wheie random numbers 
arc not employed, <p merely becomes the characteristic function of the set w 

We shall now state a theorem which will prove admissibility for all but one of 
those problems treated in sections two to five, to which theorem 3 does not apply. 

Theorem 4. Supposes = (0) is a subset of an s-dmwisional Euclidean space, 
and that fov any measurable function <p and foi any set S which has positive measure 
and is contained in to 


(80) 

implies 

(81) 


j <p(x)fo(x) dx = 


for d e S 


J<p(x)fi(x) dx = c for d 


(Here and in all that follows whenever a region of integration is not indicated, the 
integral extends over the whole x space). Suppose further that pis a critical function 
satisfying the conditions of theorem 1 and that the set So of points of increase of X 
has positive measure Then ip is admissible. 

Proof. If <p were not admissible there would exist pi with 


(82) 

(83) 

(84) 


Jipi(x) g(x) dx = J<p(x)g(x) dx, 

J<Pi(x>)fo(x) dx < J<p(x)f e (x) dx for all 6 e o>; 
J<pi(x)fg(x) dx < J<p(x)fo(x) dx for some d e u 


The set T of points 6 for which (84) holds, differs from ai at most by a null set. 
For 


(85) 


J [y>i(.-r) - (p(x)]f e (x) dx = 0 for 6 t u — T, 


and if co — T had positive measure, (85) would hold for all 6 t w. 

Let h and If'o be defined as in the proof of theoiem 3. Since S has positive 
measure, it follows that 

( 86 ) e = J ip(x)h(x) dx > j<pi(x)h(x) dx = 17 , say. 

Let tpi(x) = mm j^l ,<pi(x) + e — nj- Then 

(87) 


Jip 2 (x)h(x) dx < 
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and 

( 88 ) Jvi(x)g(x) (lx > J <pi(x)g(x) (lx. 

But <(>i is most powerful for testing H o against Ih and we have a contradiction. 

By applying theorems 3 and 4 one can easily show for all but one of the prob¬ 
lems treated in sections three to five that the teats obtained there are admissible. 
The one exception occurs when testing equality of variances. Simplifying the 
notation, since we arc now concerned with a special case, we shall assume that 
Xt(i = 1, ••< , n), Yi, , Y r are independently and normally distributed, 
the X’s with mean f 0 and variance al, 7, with mean f, and variance a\, all para¬ 
meters being unknown. We wish to test the hypothesis of equality of variances 
against the simple alternative 

Hi : £, = ?,i, ff{ = va (i = 0, • • • , r), 

with 


<roi < (Til < - ■ ' < ffrf • 

We shall first consider the case n = 1, and prove admissibility of the critical 
function 


(89) 


*(*> l/l , • 4 ' . Vr) = « 


by using a different distribution function for the parameters from the one used 
earlier. With some specialization of the distribution function, ( 8 ) becomes 
for our problem 


(90) 


exp | - ~ (x - {oi ) 2 


- £,'i) S | 
l ,-l Xil _ J 


fMf 6xp [~& (x ~*° )2dx ' 0>( * o) ] 

■ fl J exp j^- ^ (y % - f ,-) 2 (f.) j- d n(ir) 


> k 


For any <r < <roi we select the Xi ’ 0 (&) according to lemma 1. If we then take for 
ft the uniform distribution over (<r 0 i — 1, a- 0 1 ) the left hand side of (90) will reduce 
to fc. Admissibility of the critical function (89) then follows from theorem 4. 

That a constant critical function is not admissible in the case n > 1 is easily 
seen if one compares it for instance with the critical region 


(91) 


x — Eoi 
V 2 (xi — a ;) 2 


We shall not obtain a complete family of admissible tests (cf. [4]) for the case 
n > 1 but we shall show that this problem is equivalent to the following one: To 
find a complete class of unbiased admissible tests for the hypothesis specifying 
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the mean and variance of a normal distribution on the basis of a sample from 
this distribution, the class of alternatives being the totality of univariate normal 
distributions. 

Let n > 1 and let <p be any most powerful critical function for testing the 
hypothesis of equality of variances against Hi. If p corresponds to the level of 
significance e and if (3 V denotes the power of p, we have 

(92) ApOtj ■ • » ffj £# i £i, • • , £r) < e 

for all admissible values of the arguments It also follows from section 4 that 

(93) /3(p(ff<n , mi • , o>i i Joi, Ju , • • , J rl ) = 

Consider for a moment the hypothesis H'o'.a, = <r 01 (f = 0, • • ■ , r), J 0 - £oi J. 
unspecified for i = 1, • • • , r. It, is easily seen that the maximum power for test¬ 
ing Ho against H x is c. Therefore any most powerful test for testing Ho against 
II\ is also most powerful for testing Ho against Hi, and in particular this holds 
for tp. Furthermore, it follows easily from theorem 4 that for any most powerful 
test of Ha against Hi the probability of an error of the first land must be iden¬ 
tically equal to e Therefore 

(94) 0y>(<roi j ' ■ ■ i °c i , Joi i ii j ■ 1 • , J r ) — t for all Ji, • • • , Jr 

But (94) is equivalent to the condition that <p is similar with respect to Ji, • , 
J r , and it follows [12] that <p is a function of a* , ■ • , % n only. The problem is 
therefore reduced to that of finding all admissible critical functions <p(x i , ,x„) 

satisfying 

(95) dyOnu , Joi) = c, P v {<t 0 , Jo) < e for all o- 0 , Jo. 

That this problem in turn is equivalent to the one stated above is immediate when 
one considers the complementary critical functions 1 — <p 

REFERENCES 

[1] J Neyman and E, S. Peabson, “On the problem of the most efficient tests of statistical 

hypotheses,” Roy Soc Phil. Tians., Ser A, Vol 231 (1933), p 2S9 

[2] A. Wald, “Test of statistical hypotheses concerning several parameters when the 

number of observations is large,” Aw Math Sue. Trans , Vol 54 (19.43), p. 426 

[3] A. Wald, "Statistical decision functions which minimize the maximum risk,” Annals 

of Math , Vol. 46 (1945), p 265. 

[4] A Wald, “Ail essentially complete class of admissible decision-functions,” Annals of 

Math. Stat., Vol 18 (1947), p 649. 

[5] J, Neyman, “On a statistical problem arising in routine analysis and in sampling in¬ 

spection of mass production,” Annals of Math Stat , Vol. 12 (1941), p 46 

[6] H, ScheffIS, “On the theory of testing composite hypotheses with one constraint,” 

Annals of Math Slat., Vol. 13 (1942), p 280. 

[7] E, Lehmann, “On optimum tests of composite hypotheses with one constraint,” Annals 

of Math Slat , Vol 18 (1947), p 473 

[S] E Lehmann and H Scheffi 6, “On the problem of similar regions,” Proc Nat. Acad 
Sci , Vol 33 (1947), p 382 



516 


E. Ij. LEHMANN AND C. STEIN 


[0] A. Wald, "On the power Function of (he analysis of variance test,” Annals nf Malh. 
Rial., Vol. 13 (1942), p 434. 

[ 10] I 1 L. Hso, "On the powci function of the K’-test and the T’-test,” Annals nf Math, 
Slat , Vol 16 (1915), p 278 

[11] II K. Nandi, "On the average power of test criteria," Sanhhuci, Vol. 8 (1916), p 67 

[12] W FulLEII, "Note, on regions Rimilar to the sample napoo,” Slat. Res Mcmons, Vol 2 

(1938), p. 117 

[13] P. L IIsu, "Analysis of variance fiom the powei funeLion standpoint,” liiomolnka , 

Vol 32 (1911), p. 62 

[14] S Kolqdzieczyk, "On an important class of statistical hypotheses," BiomeUika, Vol. 

27 (1935),p. 161. 

[15] P. C Tang, "The power function of the analysis of variance tests with tables and 

illustrations of their use,” Stal. Res Memoirs, Vol. 2 (1938), p. 126. 

[16] A, Wald, “Foundations of a general theory of sequential decision functions," Eco- 

nometnea, Vol. 15 (1947) p 279. 



SYMBOLIC MATRIX DERIVATIVES 

By Paul S Dwyer and M S Macpiiail 
Umoemly of Michigan and Queen’s Unwasity 

Summary. Let X be the matrix t a scalar, and let dX/dt, dt/dX de¬ 
note the matrices [3o m „/3(], [dt/dx mn ] respectively, Let 7 = [y M \ be any 
matrix product involving X, X' and independent matrices, for example 7 = 
AXBX'C Consider the matrix derivatives dY/dx mn , dy M /dX. Our purpose 
is to devise a systematic method for calculating these derivatives. Thus if 
7 = AX, wc find that 97/9.1*,, = AJ mn , by n /dX = A'K M , where J mn is a 
matrix of (he same dimensions as X, with all elements zero except for a unit in 
the m-th row and n-th column, and K. vq is similarly defined with respect to 7, 
We considei also the do uva Lives of sums, differences, powers, the inverse matrix 
and the function of a function, thus setting up a matrix analogue of elementary 
differential calculus This is designed for application to statistics, and gives a 
concise and suggestive method for treating such topics a9 multiple regression 
and canonical correlation 


1. Introduction. The derivative of a matrix with respect to a scalar 


( 1 ) 


37 _ 3 r , _ 

a7 ~ to lypt] ~ 


d}lvq 

dx 


IS well known and commonly used. The symbolic derivative obtained by apply¬ 
ing a matrix of differential operators to a scalar 


( 2 ) 


dy _ 3 _ 

dX _3llm7i_ 


dy 

_ dX ,nu 


IS not in such general use though some authors give special cases For example, 
if A is a symmetric matrix and X a column matrix, so that y = X’AX is a quad¬ 
ratic form, Fraser, Duncan and Collar [1, p 48] write 


( 3 ) 


3 / 30)1 
3/3.t 2 

L 3/a.c, J 


y = 2 AX 


to indicate concisely the result of differentiating y with respect to the elements 
31 of Y 

It is to be noted that the matrix in (1) has the same dimensions (numbers 
of rows and columns) as the matrix 7, while the matrix in (2) has the dimensions 

of the matrix X. 
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We present an illustration of each of these types of symbolic matrix derivatives 
in oidcr to clarify the concepts. Thus if 


we have 



X 

‘lx 

3.r -4 

Y = 

_c x 

sin x 

log. 

dY _ 

"1 

Gx' 

— 12 .rf E 

dx 

X 

J, 

cos X 


X 31 X 12 and 





xn x,n 


X = 

Xn X 22 




_ X 31 X 32 _ 

J 


we have 


dy 

dX 


X 32 

0 

— ZlS 


-X31 

0 

Xu 


Suppose Y is any matrix product involving X, X' and independent matrices, 
for example, Y = AXBX'C. We may fix an element x mn of X and form the 
matrix 


(4) 


dY 

dXmn 


or we may fix an element y vq of Y and form the matrix 
(5) 


dyp<i 

ax ‘ 


The purpose of this paper is to devise a systematic method for calculating these 
matrices, and to give various applications in the general field of statistics. 

By way of introduction we take the matrix product Y = AX where 


A = 






Xl2~ 

an an 

a« 

and X = 

X 21 

X 22 

_021 022 

023 _ 


_X31 

X32. 


so that 


Y = 


an X11 + an S21 + Ou X31 an + ai2 X22 + ais xja 

1_021 Xn + a,n X 21 + 023 X31 021 X 12 + O 22 X 22 + 023 X 32 J . 



We have then 
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dY 

an 

rf 

dY _ 

"0 

a n ~ 

3'i'n 

_d21 

o_ 

, 3^12 

_0 

Ojj J 

dY 

ai2 

o" 1 

BY _ 

'0 

Ol2 

a.r 2 i 

_022 

o_ 

1 3 .T 22 

_0 

fl'22_ 

dY 

""am 

0" 


"0 

Ol3 

5.131 

_023 

o_ 

, 3-^32 

_0 

Chs_ 


These six equations can bo combined in the single one 

( 6 ) ~ = AJ„ 

fomn 

where J mn is a matrix having dimensions of X, with all elements zero except for 
a unit element in the m-th row and n-th column. Similarly we find 



On 

0" 


'0 

an 

QJ Qj 

(1 

Ol2 

0 

Qj 1 CU> 

Xf 

II 

0 

Ol2 


_ Ol3 

0_ 


_0 

Ol3 _ 


021 

0" 


'0 

021 

3t/21 

dX 

O 22 

0 

3?/22 

’ dX 

0 

022 


_ 023 

0_ 


_0 

023 _ 


These four equations can be combined in the single one 
(7) d -^=A'K PQ , 


where K Pq is the matrix having the dimensions of T with all elements zero except 
for a unit element in the p-th row and g-th column. 

It should be noted that the matrices on the left of (6) and (7) are matrices com¬ 


posed of the basic elements 


3Umn 


Other types of symbolic matrix derivatives could be defined and studied. We 
have selected these two main types because of their application to regression and 
correlation theory. The second type is more specifically indicated in the ap¬ 
plications but the relations between the types are such that a simultaneous treat¬ 
ment seems appropriate. 


2. Notation. Capital letters are used for matrices and small letters for 
scalars. It is understood that Y, U, V, ■ are matrices whose elements are 
functions of the elements x mn of X and that A, B, ■ • ■ (unless otherwise stated) 
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arc matrices whose elements are not functions of m/i In the development of 
the formulas it is understood that the differentiation is carried ouL with respect 
to x,„ n or X. The matrix function diffeientiated is called Y. 

We have already defined J m „ as the matrix having the dimensions of X with 
all elements zeio except for a unit element in the m-th row and the n-th column, 
and we define K n similarly with respect to Y. We now define J' nm as the matrix 
having the dimensions of X 1 with all elements zero except for a unit element in 
the n-th row and the m-th column, and we define Kqp similarly with respect 

dY f 

to Y 1 . All the formulas we obtain for -— involve J mn or J nm while all those 

a OX m n 

for involve K vq or K '„. 

9A 

3. Differentiation of a constant. If 7 = A = [a M ] we have at once 


^Vl>1 _ Q 

dXmn 

It follows that 



where the zero matrix of (8) has the dimensions of A , while that of (9) has the 
dimensions of X. 


4. Differentiation of a matrix with respect to itself. If Y = X = [x PQ ] we note 
that 

dy^ = = 1 = m,q = n) j 

dXmn dx mn 0 (otherwise) j . 

It follows that 


( 10 ) 


dY _ d . , _ , 

dx mn dx mn [y ^~ Jmn ’ 



6. Differentiation of the transpose of a matrix with respect to the matrix. 

Let Y = X\ so that 


Then 


Vpq — Xqp . 


dx mn dX^ 


jl (q = m,p = n ); 
[0 (otherwise), 
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and we have 

^ 9^~n = = J ™’ 

(13) _ £-[«J 

where J Pm , K iP are defined as in section 2. 

6. Differentiation of sums and differences of matrices. If 

we have 


then 


F 

+ 

b 

ii 

V 

- w = 

i u m 4“ 

V vi 

Wpql, 


d_y vq 


dllpq 

+ 

dv pq 

dW pQ 



dXmn 


n 

dx mn 

ax mn 

t 

33* 

dXjjm 

a 

a%mn 

\Vpq\ — 

a [Upq 4" Vpq 

~~ Wp q 


_ a 

dX mn 

frpd + 

—w- 

uXrnn 

a 

77171 

[w p q\ 


= 3f7 

4_ 

av 


aw 




dXtnn 

1 

dx m „ 

i 

dXjnn 




d VvQ 


OUpq 

4- 

dv P ,j 

dlOpq 



dX 


Tx 

i 

dX - 

Jx • 



(14) 

and similarly 

(15) 


7. General formulas for the differentiation of a two factor matrix product. 

Suppose U is a matrix with c rows and d columns and V is a matrix with d rows 
and e columns, then 

(10) Y - XJV = [y P4 ] = 2 u v , v n 


We have at once 

( 17 ) 


dy v 


5.Cm 


= E 


c hip 
1 dXm 


+ E 


dv 


,-i ' PS dX„ 


aq 


Now considering any fixed x mn it is clear that the first term on the right of (17) 

du 

is the same as the right hand term of (16) with -—- in place of u p ,. The second 

vXfti n 

term on the right of (17) is likewise the same as the right hand term of (16) with 
~~ in place of v, q We may then write 

vXmn 

(is) -*E. r +u* 

dtCnui O'Cwm 
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Also considering a fixed y M we have 



It is to be noted that this formula yields matrices of the proper dimensions (those 
of X) since and have the dimensions of X. These matrices, when 

oA dA 1 

multiplied by the scalar values v,„ and and summed, yield matrices of the 
desired dimensions. 


8. Some properties of matrix products involving J’s and K’s. Before deriving 
formulas for the differentiation of products of specific factors, it seems wise to 
derive some formulas exhibiting certain relations involving the J’a and K’s. 
Consider the matrix A having c rows and d columns and the matrix X having d 
rows and e columns. Then Y = AX is a matrix with c rows and e columns, J mn 
one with d rows and e columns, j' nm one with c rows and d columns, K vq one with 
c rows and e columns and K qv one with c rows and c columns 
It is easily seen by actual multiplication that 

(20) A J mn is a c X c matrix loith all its elements zero except those of its n-th column 
which arc those of the m-lh column of A. We omit further discussion of the dimen¬ 
sions of the matrices and assume that whenever a matrix product is written, 
the factors are comformable. Then wo can show similarly that 

(21) J m „B is a matrix with all its elements zero except those of its m-lh row , which 
are those of the n-th row of 13. Similar statements hold if J mn is replaced by J' nm 
or K m or K„ The rules are 

(a) When J„„ (or J nm or K M or K qp ) is the postmultiplier, the first subscript 
indicates the column of the other matrix which is placed in the column 
indicated by the second subscript, 

(b) When J„ n (or J „ m or K pq or K qp ) is the premultiplier, the second subscript 
indicates the row of the other matrix which is placed in the row indicated 
by the first subscript. 

Notice also that 

(22) A'K <pq T/S a mat') zee unth all fiZcwiciiis zero except those of zts Q~lh column^ wlvich 
are those of the p-th column of A\ or the p-th row of A. A similar result holds if 
x„, is replaced by K'„ or J mn or j' nm . 


9. Differentiation of specific two factor products. Let us start with Y = 
AX where the various matrices involved have the dimensions indicated in the 
last section. Application of (18), (8), (10) gives 



( 23 ) 
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while application of (19), (11) yields 


(24) 




da v 


"2 _ _ 

ax U ex x,q 


dXlq 

’ ax 


+ £ di 

y ) Ups X aq 
a-1 

= a,pJCi q -f- a n K, q +•••-(- a V iK.dq 

= a c X e matrix with all elements zeio except those of its g-th column 
which are those of the p-th row of A 


= A'Kpq 

Similar treatment of Y = XB yields 
(25) 


by (22). 


dVn,n dX mn + dX m “ Jm ” B > 


(26) 


ay v ii 

ax 


= 2 ~ K = L ICpabaq = X p3 B'. 


If we treat Y = AX' in a similar fashion, we get 
(27) 


dY _ , 

ax mn lJnm ’ 


(28) 

while 7 = X'B yields 

(29) 

(30) 


aijpq _ Tf 1 a 

— rVgp/l, 


dY 

a%mn 

dy?<i 

dX 


— J i\mB, 
= BK'. 


It is to be noted that J always has the subscripts mn, and similarly we find always 
J «, K PS , K qv . We may therefore omit the subscripts on these letters. When 
we do so we shall also write 


dY dX 

d(x) lor dx nn ’ 


f or ay v q 

ax ax ’ 


placing brackets ( ) around the matrix from which a fixed element is to be 
chosen. Thus if 7 = AX, we write instead of (23) and (24) 


(23a) 


(24a) 


37 -AJ- 
d(X) 

= A'K. 

ax 


The other results are summarized in lines 1-5 of Table I 
Examination of (18) and (19) shows that the derivatives of products with 
two variable factors are obtained by adding the results obtained by holding 
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each factor constant while differentiating the other. With this in mind, (23)—(30) 
can lie used to obtain the derivatives of double products involving X and X'. 
Thus if Y = XX, we get 

(31) = JX + XJ, ^ = KX' + X'K. 

Other double product formulas involving X and X' arc given in Table I. 


TABLE I 


For- 1 
mula 1 

Y 

ar 

d(X) 

8<y) 

1 ' 

AB 

0 

0 

2 

AX 

AJ 

A'K 

3 

XB 

JB 

KB’ 

4 

AX' 

AJ' 

K'A 

5 

X'B 

J'B 

BK' 

6 

XX 

t/A XJ 

KX' + X'K 

7 

X'X 

J'X + X'J 

XIV + XK 

8 

XX' 

JX 1 + XJ' 

KX + IV X 

9 

X'X' 

J'X' + X'J' 

X'K' + IV X' 


not so easy to write. 


qY d(Y) 

The formulas for . . are written down very easily, but those for --- - are 
o\A / dX 

dY d(Y) 

However the values of . , and - in formulas 2-5 of 

O^A ) OA 

Table I are such that the results for may be obtained from those for -Y 

dX d(X) 

with the use of a few simple rules. They arc 

(a) Each J becomes K and each J' becomes K'. 

(b) The pre (or post) multiplier of J becomes its transpose. 

(c) The pre (or post) multiplier of J' becomes a post (or pre) multiplier of K'. 
These rules are immediately applicable to the double products. Thus when 
Y = X'X we have 

dY 

= J'X + X'J, 


and so 


a(A> 

a(F) 

~dX 


= XIV + XIV 


10. Differentiation of three (or more) factor products. Products with three 
factors can be differentiated by the formulas of the last section if two adjacent 
factois are constant, Thus if Y ~ ABX, we have 


dY 

d(X) 


= ABJ, 


9(Y) 

dX 


B'A'IV 
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It is not yet demonstrated that these rules are applicable to the products AXB 
and AX'B However it can be shown by the general methods indicated earlier 
that if Y = AXB , we obtain 


(33) 


1 ? = a ' kb ■ 

while if Y 

= AX'B we have 


(34) 

II 

iR 

<o hs' 

YT - BK'A 


It is now apparent that the rules of the last section apply to situations in which 
there are both pre and post multipliers. 

The general theory for two-factor products is immediately extendable. Thus 
if 7 - UVW with y m = £ £ v „w, q then the basic element is 

a r 


(35) 

OX THTV 8 r OX mn 


+ EE»».f^»,. + EL 


dXi 


Una V fll 


dWu 

v, 


and the formulas result from treating each factor in turn as the only variable. 
For example if Y = XX'X, we have 


(36) 


dY 

d(X) 


JX'X + XJ'X+XX'J, 


and 

® = K(X'X)' + XK'X + (XX')'K 
(37) dX 

= KX'X + XK'X + XX'K. 

The symbolic derivatives of certain triple product matrices are presented in 

Table II. , , 

The rules are sufficiently general to take care of matrices with more than 

three factors. Thus if Y — A'X'XB, we have 


(38) 
and 

(39) 


J r L. = A'J'XB + A'X'JB 
d(X) 


& = XBK'A' + XAKB', 
dX 


and in the special case B = A, we get 

( 40 ) ^ r AVX + X’J)A, 

d -^l = XA{.K' + K)A'. 
dX 


( 41 ) 
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Similarly if F = X'A'AX, we get 

(42) d d * = J'A’A.X + X'A'AJ, 
and 

(43) = A'AXK' + A'AXK. 

o A 


TABLE II 


For¬ 

mula 

Y 

8Y 

8(X) 

8(Y) 

ax 

1 

ABC 

0 

0 

2 

ABX 

ABJ 

B'A'K 

3 

AXC 

AJC 

A'KC 

4 

XBC 

JBQ 

KC'B' 

5 

ABX 1 

ABJ' 

K'AB 

6 

AX‘C 

AJ'C 

CK'A 

7 

X'BC 

J'BC 

BCK' 

8 

a: xx 

AJX + AXJ 

A'KX' + X'A'K 

9 

XBX 

JBX + XBJ 

KX'B' + B'X'K 

10 

XXC 

JXC -1- XJC 

KC'X' + X'KC 

11 

AX'X' 

AJ'X' + AX'J' 

X'K'A + K'AX' 

12 

X'BX' 

J'BX' + X'BJ' 

BX'K' + K'X'B 

13 

X'X'C 

J'X'C + X'J'C 

X'CK' + CK'X' 

14 

AX'X 

AJ'X + AX'J 

XK'A + XA'K 

15 

X'BX 

J'BX + X'BJ 

BXK' + B’XK 

10 

X'XC 

J'XC + X'JC 

XCK' + XKC' 

17 

AXX' 

AJX' + AXJ' 

A'KX + K'AX 

18 

XBX 1 

JBX 1 + XBJ' 

ICXB' + K'XB 

19 

XX'C 

JX'C -h XJ'C 

KC'X 4- CK'X 

20 

XXX 

JXX + XJX + XXJ 

KX'X' 4- X'KX' + X'X'K 

21 

XXX' 

JXX' + XJX' + XXJ' 

KXX' 4- X'KX + K'XX 

22 

XX'X 

JX'X + XJ'X + XX'J 

KX'X 4- XK'X 4- XX'K 

23 

X'XX 

J'XX + X'JX + X'XJ 

XXK' 4- XKX' 4- X'XK 

24 

XX'X 1 

JX'X' + XJ'X' + XX'J' 

KXX 4- X'K'X 4- K’XX' 

25 

X'XX' 

J'XX' + X'JX' + X'XJ' 

XX'K' 4- XKX + K’X'X 

20 

X'X'X 

J'X'X +- X'J'X + X'X'J 

X'XIC 4- XK'X' 4- XXK 

27 

X'X'X' 

J'X'X' -1- X'J'X' -1- X'X'J' 

X'X'K' + X'K'X' + K'X'X' 


Finally if F = XAX'AX, we get 

aV 

(44) = JAX'AX + XAJ'AX + XAX'AJ, 

flY 

(45) ^ = KX'A'XA' + AXK'XA + A'XA'X'K. 


11. Vector results. It should be emphasized that each of the above results 
is a general result. More specific results may be obtained in case one (or more) 
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of the matrices is a vector. For example if X. is a column matrix and 
Y = X C BX „, then Y is a scalar, so K and K' are both unity and we have from 
Table II (15) 


(46) 


~dX = BX ° + B ' x ‘ = ( B + B')X C . 


If m addition B is symmetric, B 1 = B and we have 


d(7) _ 


ax' " 2BX ‘> 

which is the result indicated in (3). 

12. Differentiation of the inverse of X. It is possible to use implicit differen- 
tiation to derive formulas for and ■ We write I = XX -1 and get 


so that 

(47) 
whence 

(48) 


= , y^X" 1 

a(x) + x d(x) ’ 


dX~ l __ x j 

a(x) " xrjx ' 


= — (x-'mx- 1 )'. 


The formula (47) is a generalization of a known matrix differential formula 
[3:3.4], 

In a similar way we derive 


(49) 

(50) 


xxr = 

= -(x'rx'(x')- 1 . 

dX 


13. Differentiation of a function of a function. The theory developed in the 
earlier sections is sufficiently general to be useful in differentiating a function of 
a function if the functions involve addition, subtraction, premultiplication, post- 
multiplication, and inverse. For example if 

(51) Y = Z'Z with Z = AX 


we have 


dY * Z ‘ V , m M 

(dX) ~ d(X) + 9(1)’ 
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and since 

_ TIAI „ 1 d'A _ AT 

d(x) ~ and a(x> ~ AJ ’ 

(52) (1 H) = J'A'Z + Z'AJ, 

9 A 

and thence 

(53) = A'ZK' + A'ZK. 

a A 

These results are equivalent to those of (42) and (43). 

14. Differentiation of a power of a square matrix. The values of the sym¬ 
bolic derivatives of X 2 , X 3 with respect to X are given in Tables I and II. It can 
be shown similarly that if a is a positive integer 

(54) *£! = JX"- 1 + 2 X'JX"— 1 + T~ l J, 

0\X) i»L 

and this can be written aft 

(55) *£ = 2 XVX"— 1 , 

if we adopt the convention that X° is I. It follows at once that 

(56) = 2 X" K{X') n ~’~ l 

oX 

It is thence possible to derive formulas for the symbolic derivatives of X~ n . 
Since X _ "X n = I, we have 

(57) Uj' r -1- XT" [2 X , JX n — 1 = 0, 

so 

(58) = -X“ n [2 X~ n , 

and 

(59) °-^p. = ~x~ n [2 CXO'/COT - * -1 ] X"". 

16. Applications. We consider the classical theory of least squares, a matrix 
presentation of which is available in [2], Suppose that y and a u are measured 
from their means and that y is to be estimated from the n variables xt , Form 
the values of y into a column matrix Y and the values of a n into an N by n matrix 
X. Introduce the column matrix B of n parameters 6, and define 

(60) 


E = Y - XB. 
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Note that the matrix E'E is in this case the single element matrix which is the 
sum of the squares of the residuals Following the least squares method we 
minimize this by differentiating with respect to the elements of B. We first 
note that 

(61) E'E = (Y' - B'X’){Y - XB) 

= Y'Y — Y'XB - B'X'Y + B'X'XB 


Then we write down first 

(62) = —Y'XJ — J' X 1 Y + J'X' XB + B' X' XJ, 

from which we get 

d SEx®l = -X' YK - X'YK' + X'XBK' + X'XBK 


(63) 


dB 


= -X'(Y - XB){K + K') = -X'E(K + K 1 ) 

The J’ S and K’s arc associated with B and E'E respectively. Here E'E is scalar 
so that K = K 1 = 1 and we have 


(64) 


*M} = - 2 X'E. 
dB 


The equation X'E = 0, obtained by equating the right hand side of (64) to zero, 
is a statement of the normal equations in matrix form. 

Equation (64) may also be obtained with the use of the methods of section 

13. In this case 


dE 

m 


= -XJ, 


dE 1 

d(B) 


-J'X', 


and we have 


(65) 

SO 

( 66 ) 


d(E'E) _ dE 1 g g, _SE _ _ rTFi _ wxj. 

-^BT ' d{B) * ^ W) 

d( E ’E) = —X'EK' - X'EK = —X'E(K' + K). 
dB 


The equation (64) is also applicable to the more general problem in which 
an d w 2 are estimated from the same set of variables a:,. The only change 
needed is to regard Y, B, E as two-column matrices so that E'E is a matrix with 
two rows and columns which we denote by 


«11 «12 
_«21 « 22 . 
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Wc require = 0 and ^ = 0. From equation (03), inserting subscripts, 
wo get 


~ = -X'E(K n + A’n) 


= -2X'EK n ; 


9ej 2 

3 5 


~2X'EK n . 


It is easily seen that ~ = ~ = 0 is equivalent to X'E = 0, the same equation 

as we obtained in the last paragraph. We also arrive at the incidental result that 
in minimizing SeJ, and Sf 2 separately we find at the same time a stationary 
value of 22 e! £ 2 . 

In this way we can treat two or more simultaneous regression problems with 
this general notation as easily as we can treat one. 

As a second application of tho theory wo outline the initial steps in the direc¬ 
tion of the formulas for canonical correlation [4J, [5]. In this case A and B are 
unknown column vectors with X and Y known rectangular matrices. Then 
XA is a column matrix: 



whose elements l, may be regarded as observed values of a linear form l. Simi¬ 
larly YB = A, a column matrix whose elements may be replaced as observed 
values of a linear form X. It is desired to find A and B such that l and \ may have 
the largest correlation coefficient, and to find the size of this coefficient. Then 
A’X'XA, B'Y'YB, and B'Y'XA = A'X’YB are scalars, and 


(67) 


_ B' F XA 

P ~ V{A' X' XA)(B'Y r YB )' 


If the scales of X and Y are chosen so that A'X'XA = 1 and B'Y'YB = 1, we 
have 


(68) P = B'Y'XA = A'X'YB. 

Using Lagrange multipliers we set 

(69) 4> = B' Y' XA + % (1 - A'X'XA) -f- ~ (1 - J9' F' YB), 

2 2 , 

and differentiate with lespect to the elements of A and B. We first differentiate 
<P with respect to A after replacing B'Y'XA by A'X'YB: 
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(70> m = J ' x ' rB - 2 (j ' z ' m + XT™-, 

(71) M = A- YBK' - £ {X'XAK + X'XAK) 

(The J’s and K’s are associated with A and 0 respectively) Wc set = a 
with K — IC = 1 to get ^ 

(72) X'YB = cX'XX, 

whence by (57) 

(73) p = A'X'YB = cA'X'XA = c, 
and 

(74) X'YB = pX'XA. 

Similar differentiation with respect to B gives p = d and 

(75) F'XA = pY'YB. 

The further steps in the development of canonical correlation theory are based 

on (74) and (75). 

A third application is to orthogonal regression. The situation is very similar 
to that of the first illustration, but the errors are measured orthogonal to the 
plane of best fit. As before we take the variates as measured from their means 
and so have the basic equation 

W) n _ ^ lXl ^2 + • • + b h x k 

(76) D ~ VW+bl + ■ ■ ■ TH ' 

This can be written as 

(77) D = kxt + l 2 x 2 + • • • + hx]. = XL with L'L = 1. 

It follows that the quantity to be minimized is 

(78) D'D = L'X'XL. 

With the use of Lagrange multipliers we have 


so that 


from which 


$ = L'X'XL + A(1 - L'L) 


-Y* = J'X'XL + L'X'XJ - \{J'L + L'J), 
d{L) 

^ = X'XLK' + X'XLK - \{LK' + LK ) 
31 / 


2 X'XL - 2XL = 0 
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and the values can be determined from the equation 

(83) (X'X - \)L =■ 0. 

The solution continues with the use of the characteristic equation. 

It is to be noted from (79) and (82) that 

D'D = L'X'XL = \L'L = X 

so that (83) becomes 

(84) {X'X - D‘D)L =■ 0. 

A fourth illustration uses symbolic derivatives in obtaining the principal com¬ 
ponents of a total variance [5,252] The variable portion of the exponent of the 
multivariate normal can be written Y'AY where Y is the column vector 
[i/t, ■ ■ ■ , ih] and A is a k by k matrix. We set this equal to a constant, say C, 
and get the equation of the k dimensional ellipsoid It is desired to locate the 
extrema of this ellipsoid. To do this wc find the extrema of Y'Y. Using the 
Lagrange multiplier wc have 

(85) 4> = FT + HC - FA F) 
so that 

(80) ^ - ■/' Y f Y'J - X(./UF + TAJ), 

(87) = FA' + YK - X(A FA" + A FA), 
so that there results 

(88) F — X/1F = 0. 

Pre-multiplying by A~ l we got 

(89) (A~ ] - X)F = 0 

and pre-multiplying by Y' gives the important relation 

(90) FT = XC. 

A fifth illustration utilizes symbolic differentiation in developing the theory 
of the linear discriminant function [0, 341] [8,124], As in the other illustrations, 
the variates are measured about their means. The unknown multipliers are 
indicated by the vector L. Then 

(91) Z = XL 

is the general matrix equation while 

(92) Zt = X\L 

z, = x 2 l 
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aie the corresponding equations for the tivo groups. Then 

(93) Zi = XiL, Z, = XL, and Z x - 2 a = (Z t - Xi)L = DL, 

(W ^ - z, = (x. - iji = r,L, 

^ - 2 2 = (X 2 - Z 2 )L = y 2 l 

The within group variation, L’Y[YJ, + L'Y'^L, is then divided into the 
between group variation, L'D'DL, to get 


(95) 


6 = 


L'D'DL 


L'Y[ Y[L + L'Y\ y 2 l 


A 
B ' 


We wish to maximize G. 
(96) 


Since A and B are scalars — 

dL 

5(B) _ 1 9(A) 
dL G dL 


which becomes, with further differentiation 


(97) 


(Y[Y l +Y',Y 2 )L = D'(^ V 

\ (j / 


DL 


Since is a scalar, we have 


(98) 


(Y[ Y, + Y( F 2 )L = cD. 


0 reduces to 


Any convenient value of c can be used for purposes of discrimination. It is 
customary to take c = 1 and then to adjust (98) so that some l, is unity. 

A final illustration applies symbolic matrix differentiation to a theorem of 
multiple factor analysis. This presentation parallels that given by Thurstone 
[7,473-477] for transforming any factorial matrix into a principal axes matrix. 
The matrix. 


(99) F = [a„] 
has p rows and r columns, r < p, such that 

(100) FF’ = R 


where R is a p X p correlation matrix. 

It is desired to apply the unitary orthogonal transformation L to F in such a 
way as to produce a matrix, called F p , which has the sums of the squares m 
respective columns a maximum. This can be done by maximizing simultane¬ 
ously the diagonal terms of F P F P where 

(101) F p = FL. 

Again using Lagrange multipliers, we have 

(102) 4> = L’F'FL + \(I - VL). 
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This equation has the same analytical form as (79). Differentiation leads to 
the result 

(103) (F'F - \)L = 0. 

The solution of (103) gives the value L which can be substituted in (101) to 
obtain F p . 

14. Conclusion. Tw r o types of symbolic matrix derivatives have been de¬ 
fined. Laws have been developed for the basic operations of addition, sub¬ 
traction, multiplication, inverse, and powers. Laws for more extended func¬ 
tions can be worked out on the basis of principles enunciated. 

Applications are given to certain multivariate problems. It is our thesis that 
with these differentiation formulas available, much work in multivariate analysis 
can be carried on with a simple matrix notation. 
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ON THE LIMITING DISTRIBUTIONS OF ESTIMATES BASED ON 

SAMPLES FROM FINITE UNIVERSES 1 , , 

By William G. Madow 
Institute of Statistics, University of North Carolina 

1. Summary. The paper shows that under very broad conditions the usual 
theorems concerning the limiting distributions of estimates hold for estimates 
based on samples selected from finite universes, at random without replacement. 
It may be remarked that under the same conditions, the same conclusions are 
true for random sampling from finite universes with leplacement, if the universes 
are permitted to change within the limitations set by condition \V. 

2. Introduction. It has long been known that the limiting distribution of 
arithmetic means of samples selected at random with replacement from finite 
universes, or from infinite universes is normal under very general conditions 
When, however, a sample is selected from a finite universe without replacement, 
and the size of the sample as compared with that of the universe is too large, for 
the universe to bo treated as infinite, the proof that the limiting distribution. of 
the mean is normal appears to have been given only for the case where the uni¬ 
verse is multinomial 2 In this paper we prove that the limiting distribution of 
the moan is normal provided only that as the universe increases in size, the higher 
moments do not increase too rapidly as compared with the variance, and that 
for sufficiently large sizes of sample and population the ratio of size of sample to 
size of universe is bounded away from 1. Vanous extensions are given, but these 
are almost immediate consequences of the theorem on the limiting distribution 
of the mean. 

The method used is that of showing that the moments Of the standardized mean 
tend to those of the normal distribution. In doing this wc generalize a theorem 
of Wald and Wolfowitz, 3 by making it applicable to permutations of samples 
from finite populations, and by reducing a little the conditions on the coefficients 
The theorem on the mean is then a simple corollary. 

We also note that with these proofs on limiting distributions we can make the 
corresponding assertions concerning characteristic functions. Although no 
applications of this fact are given, it seems likely that some useful results could 
be obtained. ■ 1 

3. Preliminary lemmas. In calculating the fc-th moments and their limits we 

1 Presented to the American Mathematical Society at a meeting held in New York City 
on April 17,1948. 

5 See F. N. David, "Limiting distxibutions connected with certain methods of sampling 
human populations,” Slat Res. Mem , Vol, 2 (1938), pp. 69-90, especially p. 77. 

5 A Wald and J. Wolfowitz, “Statistical tests based on permutations of the observa¬ 
tions,” Annals of Math. Siat., Vol. 6 (1944), pp 358-372, especially p 359. 
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shall use an infrequently given form of tlie multinomial expansion and some 
properties of symmetric: polynomials, In this section we make the necessary 
definitions, and present four lemmas embodying the results we shall use. 4 

A /-partition of a positive integer A consists of / positive integers , • ■ ■ , a, 
such that on + •• • H- a, ~ k. Two /-partitions a, , ■ • , a, and ft, • , ft 0 f 

k will be said to be distinct if for at least one value of h we have a* 5 * ft, 

Let <p(ca , • • • , a,), written <p(ct), be any function of the /-partitions of k. By 
we shall mean the summation of y>(«i , • • , «<) over all distinct /-parti¬ 
tions of k, 

By 2 S iv(a) we shall mean the summation of <p(a) over all distinct permutations 
of <*1 , • • • , «/. 

By 2jjivj(a) we shall mean the summation of p(a) over all distinct t partitions 
of k satisfying the condition on > «s > • ■ • > ct t . 

Let 'P{yi , • • • , v 1 ) be any function of the variables , • ■ , v t . Then by 
, • • • , v t ) we shall mean the summation of f (m , • • • , v t ) over all possible 
selections of / integers from 1 to n arranged so that vi > vi > ■ ■ ■ > v, . 

The formula for the multinomial given below is not presented as a new result 
It is given only as u means of referring to the result we need. 

Lemma 1. Let £ 1 , • • - , £„ be any quantities or random variables and let k be a 
positive integer. Then 

(?!+''■ + tn/ = £ 2i| f't!••.«( Si,,?" 1 • • ■ £“‘, 

(ct 1 

where 

M _ k! 

«i! • • • «/! 

The proof is omitted. 

The following lemma will be useful in connection with several of the results of 
this section: 

Lemma 2 . If yia) is a function of the L-parlilions of k, then 

^nV’C®) = 23(22i^(a). 

The verbalization of the lemma is practically its proof. 

Let us now define certain symmetric polynomials that wo shall use 
Let S Ul = 2£“ l • ■ • £“‘ where the a’s are positive integers and the sum¬ 
mation extends over all possible arrangements in , ■ • • , vt of / of the integers 
1 , • • , N. Hence there will be N U) = N(N — 1) • • « (N — l -f 1) terms in 

Q 

or 1 ,<* ,ai • 

Lemma 3 Suppose that k , ■ ■ • , t h are an h partition of t, that 

®U+' '+U- 1+1 ~ an+ ■ +i, 1 if = 1, ' ' ' 1 h, to = 0), 

4 The order of sections 3 and 4 is largely a matter of taste; some may prefer to treat sec¬ 
tion 3 aB an appendix to section 4 to be referred to when necessary. 
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and that 


“i ^ “L+i ^ ^ a, t+ , +1(l _ l+1 . 

Then, defining 

(3.1) S' ai a , = 2 2i Sa“‘ • • • Q , 

it follows that 

Sai , • ,ai ~ t] ! * * th. ,a t ■ 

To prove Lemma 3, it is only necessary to note that each term of S' ai , will 
determine h! ■ fo! equal terms of >S ai ,. 

Although the moments that we shall obtain will be functions of S dl , , the 
condition that we shall use on the moments can be interpreted directly only in 
terms of S, . Consequently, in order to be able to analyze the implications of 
that condition on S ai , , we state the following lemma: 

Lemma 4. The symmetric polynomial S ai .. is equal to a sum of products of 
ihejorm 

zfc ‘ 

where 71 , • • • , 7/ ( are an h-partition of k, h < t, and each 7 is a sum of one or more 
of the a’s. Furthermore , if S x = 0, then h < [/c/2] where [/c/2] = k/2 if k is even 
and [fc/2] = (/c — l)/2 if k is odd. This follows from the result 

(3.2) S ai ‘ & at, • ,OTJ _1 = a t ,...,ex( 4” Sa, +aj ,02,- • ,e>i_i 4” ' ‘ ' 4“ .«<-!. “1-1+“ I 1 

Proof: It is easy to prove (3.2) by comparing terms. Then the other asser¬ 
tions follow from the repeated use of (3.2) and the resulting fact that each 7 is a 
sum of one or more of the a’s. 


4. The limiting distribution. In this section we obtain the generalization of 
the theorem of Wald ancl Wolfowitz to which reference was made above. 

Let Ui , U'i , • • , U N , • be a sequence of universes, the universe U N con¬ 
taining the elements 6 x vN and let the arithmetic mean of the elements of U N be 
denoted by .r,v . Furthermore, let 

UrN ~ Mi (Uif) = 2 ( X “lf ~ • 


Let Ci, Cs , ■ • • , C„ , ■ ■ • be a sequence of sets of coefficients, the set C« con¬ 
taining the elements c ]n and let the arithmetic mean of the elements of C„ be 
denoted by 6 n • We exclude the possibility that the elements of any G n all vanish, 
and hence we can suppose that 2^ c li = 1 Furthermore, let 

5 The letter v will assume all integral values from 1 to N. The letter r will assume all 
positive integral values. The letter; will assume all integral values from Ho rn Thelettr 
t will assume all integral values from 1 to fc The symbol lim will stand for_ the « » r 

N or both, as the case may be, increase without limit, it being understood that lun n/N < 1. 
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Since ^(c jri — c n f > 0, it follows that, if we define A n = n 2 c„ , then ri 2 „ < i, 

1 

Let n elements he selected at random without replacement from U N and let 
us denote these elements by x', n , the subscript j indicating the order of selection, 
i.e., x[ n is the z-th element of f/, v selected for the sample even though it may be 

Xtrtf • 

The linear function that we shall study is 

' t ( ' 

Zn — Cin-Cl.v -r • • • + C nn x„.v , 

i.e., the value of z n is determined by multiplying the j-th element selected for the 
sample by c,„ and summing for 7 . Then, since Ex\ n — , we have 

Ez„ = n:B,vc n . 

furthermore, 

To see this we first note that 

C|'n C/n = n 2 n 1, 

W-l 

E(x',„ - $nT * ]X W , 

and, if i ^ j, 

Ii{x[ n - £//)(x'„ - aW) = MijV 
From the definition of variance wc have 

n 

Ge n 3=3 E(z n EZn') = ^ 1 C\n Cj n El(%in ’ttf), 

>.)-l 

and making the indicated substitutions the result follows from a few simple 
manipulations. 

If we define x n to be the arithmetic mean of ,r(,v, • ■ • , aw, then it follows that 
■\fn cjn = 1 and, as is well known, 


Ex n = ,E 



Hence, if we can find the limiting distribution of 

ry Zfl Ec n 

/j n - j 

<T'n 

then the limiting distribution of (x - x)/c s will be a special case 
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We shall need to place some sort of limitation on the sequences U N and C if 
ive are to obtain theorems on limiting distributions of statistics based on them 
The condition W that we shall use is satisfied by a slightly larger class of se¬ 
quences Un and C„ than that of Wald and Wolfowitz because it does not rule out 
the possibility that all the elements of C n should be equal It should be noted 
however, that for their purposes this extension of the class of sequences satisfying 
Un and C n is vacuous since they required n = N, so that m their case if all the 
elements of C n were equal, say k/N, we would have z N = k x„ no matter in what 
order the elements of IJn were selected for the sample. 

Condition W. The sequence U„ and C n will satisfy the condition W if 

Vitl = l*2N \{N), 
t*'rn *= » _r/2 Xr(n), 


for sufficiently large n and JV, where a finite value X exists such that for all r 

SUp | \ r (N) | < X, 
sup | \' r (n) ( < X, 

and € > 0. 


(Note that if W is satisfied for all even values of r then W is also satisfied for all 
odd values of r since #i P+ yi r > jur+i). 

A general theorem on moments is the following: 

Theorem 1. Let and S ail . a, be defined m terms of x.y — x,y 

instead of £„ and let T ai , , a , be the same function of the cj n that S' ai , is of the 
x,i y — x N . Furthermore, let E k = EZ k n . Then 

(4.D , B^IXuCii 


Proop: From the definition of Z n and Lemma 1, it follows that 
<t\ n E, = TiLuCi,. ai T, t nc:f n ,..cf; n E(x' n N - • •(*:,» 

i 


i*)"' 


Since we are selecting at random without replacement it follows that 
N {t) E(x ntl - £>) #1 ••• (x' h n - x N ) ai = S ai ..... 

If we nrnv use Lemma 2 to replace Su by 2stZu , we then obtain 


<r k t „ Ek = L., Cti-.-af Sa x -.a, C »i« 


u 9 t n i 


since both C k au and S ai , are invariant under permutations of m , • • • , 
a t , Then from (3 1) and the definition of Tai, .,a ( , it follows that (4 1) is 
proved. 
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Our fundamental theorem is: 

Theorem 2. If the (sequences U N and C n satisfy the condition W, then 

hm E il+ i = 0 , 


and 


so that, for any a, 


lim Ei, 


m 

2 ’-j!' 


lim P\Z n < a) 


- v&L 


e^' l2) dx, 


Proof: Wo wish to show that lim E k exists and has the values given above 
First consider the parts of the typical term of E k that depend on n and N, i e., 
the expression 


(* 

B - 

A r(() niy(N/N 


T' 

X «i. 


D i/S (1 - nAl/N) 


m 


Since lim E k will be the sum of the limits of a finite number of these terms, let 
us first determine under what conditions B will tend to zero as n and N become 
infinite. 

From Lemma 4 it follows that 


E a 




X. 


w > 


where -y, + 7 a = m + • ■ ■ + m and each of the y’s is the sum of one or more 

of the a’s. From the definition of X„ lt . in terms of x — x K it follows that 
S\ = 0. Hence the minimum value of all y’s in any non-vamshing term of the 
summation is 2. Consequently we can say that for all non-vanishing terms h < 
[fc/2] and h < t. Finally if condition W is satisfied then 


where 


Similarly 


s n --- s n - NVi'X(N) 

sup | \h(N) | < x\ 

Ba,, •.,«( = 2 ± Ty ir ■ ,Ty / , 


where it may be that T\ 0 so that we cannot require g < [/c/2] for the term 
'' L To to be non-vanishing. We still have, however, from Lemma 4 that 
g < t- 

If condition W is satisfied, then 

T yi ■ ■ ■ Ty, = n°- k X(n), 
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where 

sup | \'(n) | < y. 

Hence, from Lemma 4, the definitions of and p' )n and condition W it follows 
that B is a sum (the number of terms does not depend on n or N) of terms like 

n _ Wnr*!* \(N)\'(n) 

N W (N/N- l) m (l-nAl/N) m ’ 

where 

h < [k/2], h < t, g <t, 
and 

sup | \(N) | < oo, 
and sup | X'(n) | < », 

Since h < t, it follows that if g < k/2 then lim D - 0 Hence, a possibly non¬ 
vanishing term must have g > k/2 and hence t > /c/2 because t> g. Further¬ 
more, t>g + h~ k/2, since h - k/2 < 0 and t ^ g Hence t - h > g - k/ 2 
Now, we can write 

o-m 

MiV, n), 

where 

sup | h(N, n) | < co, 

since nA\/N < 1 — « for sufficiently large n and N. 

Hence 

hm D = 0, 

unless, perhaps, when g — k/2 = t — h, i e., h — k/2 = t - g. Since h - k/2 < 
0 and / — g > 0, it follows that we must have h = k/2 and t — g for lim D to be 
possibly not zero. 

If k is odd, then h < (k — l)/2 and hence 

hm E i)+ i = 0, 

since all terms obtained by expanding it as above will tend to zero 
If k is even, say k = 2j, and lim D is possibly non-vanishing, then h must equal 
j and we must have Tl *= * *. = y y = 2. Consequently, from Lemma 4, the 
only possibly non-vanishing terms of En are those arising from the polynomials 
Sa L ,.. with eti = ■ ■ ■ = a, = 2, and a a +i = ■ = a t = 1, so that 

2s 4 -1 — s = 2j or t - 2] — s, s = 0,1, • ■ ,j. For such values of on , ■ • • , a, 
we have 

(2jy 

2 * ' 


C k 

• ,arj 
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Furthermore, as shown below, in developing .SO,... . a , by means of Lemma 4 the 
coefficient of S’ is 


(4.2) 


/ , y -* (2j 2s) ! 

1 ’ 2»“*(j - S) ■ 


Demonstration of (4.2) : If s « j t then it follows from Lemma 4 that the 
coefficient of >S'{is 1. If s < ?. we. use Ix'.mma 4, and noting that Si - 0, we 
obtain 


where, since a, — {, wo have, cn + «*» — + «i — • * • = 1, a, + a ( — 3, and 

a.Hi + «i = • ■ • = at ~i + at = 2. Consecpiently of the t — 1 terms of the 
above evaluation of iS„,, exactly s will have «’k > 2 and t — s — 1 will be 
of the same form as except that instead of s of the a’s being 2 we have 

s -f 1 of the a’s equal 2. For each such s wc repeat the process obtaining 

S ai , = (~l) (, "‘ )/8 (l - h - l)(f - 8 - 3) • • • 3-3 

7 

+ terms which have h < j. 

Consequently (4.2) provides the coefficient of <S» in *S' H1) ..., a| . Since the other 
terms of have h < j, they lead to terms of E 2) that vanish in the limit, 

Furthermore,, by Lemma 3, « T„,,„ ( «'(f — s ) I and the only terra 

of r ail for which g - t is 

J 21 i “71 **1 n ► 


The other terms of T ai ,will lead to terms of E t} that vanish in the limit since 
g < i. Consequently, eliminating terms known to tend to zero as n and N be¬ 
come infinite, we see that E t j — f(n, N) tends to zero as n and N become infinite, 
where 


f(n, N) 


f (W ( _ lV -. _M_- toY-Nty rA"-* 

=o 2* ' 2 ,- ‘ s!(2j — 2s) 1 N (2J ~‘\l — nA%/N)’ ’ 


Now as n and N become infinite with n < N, we see that 

lim/(n, N) = lim ± (-1)'- ~ - r , (nA* n /N )-/(1 - nA\/N)’ 
X «„o sqj — s)! 


M ( 2 Jl l 

20'1 ’ 

i.e., 


lim Etj 


L 2 i)’' 
2 y 


To complete the proof it is only necessary to note that tho normal distribution is 
completely determined by its moments, 6 


1 See for example, M. G. Kendall, The Advanced Theory of Statistics, Vol. I, London, 
Charles Griffin and Company, page 110. 
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Since Theorem 2 is a generalization of the Theorem of Wald and Wolfowitz 
it is possible to generalize slightly all the applications they make of their theorem! 
The statements of these generalizations are omitted 
The application of Theorem 2 that led to this paper is the following: Suppose 
that Cjn n ■ Then the sequence C n satisfies Tf and A„ = 1. Consequently 
we have proved 

Corollary 1, If the sequence Uh satisfies the condition W and if x n is the arith¬ 
metic mean of a sample of n dements selected at random without replacement from 
Un , then, for all a, 


lim P 


n ll2 (x„ — xf) 


Pin(1 — m/n ) 


\ < a 


- (£)£ 


-**/2 


dx. 


provided that « > 0 exists such that n/N < 1 — e, if n and N are sufficiently large. 

Now the sequence of Un will certainly satisfy If if U N has the same moments 
for all values of N, or if the moments of U N tend to fixed values as N increases, 
or if the universe Un is a random sample of a universe having these properties. 
Consequently Theorem 1 and its corollaries will be valid for many applications, 
among them being the case studied by P. N. David 7 when Un has the same multi¬ 
nomial distribution for each value of N. 

The condition TP is immediately satisfied for large classes of changing uni¬ 
verses. For example, if the elements of all U N are uniformly bounded and 


lim paw ^ 0, 


then the condition TP is satisfied. As an illustration, consider the case where 
U N contains Npn elements having the value one and N(1 — p N ) elements having 
the value zero. Then 


and 


Pin = Pn(1 “ Pn), 


i ^ * 

PrN = Jj ^2 (1 “ P/fY + X) (~PlfY , 

iY r«l r=lfjj w +l 

= px{l — p N ) r + (— l) r (l — Pn)Pn ■ 


Hence 

/■* \rl2 rl 2 

PrN _ (1 — Vn) _L r_1 

r/2 — r/2-1 i n tr _ yll-l > 

Pin Pn (1 - Pn) 

so that condition TP will be satisfied if « > 0 exists such that e < p.v < 1 — e 
for all sufficiently large N. 

Hence the limiting distribution of Z n will be normal no matter how the propor¬ 
tions Pn change provided only that the universe Un does not come to consist 
essentially only of zeros or only of ones. 


7 Op. cit. 
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Various multivariate extensions of Theorem 2 are immediate. For example: 
Theorem 3. Suppose that the dements of T, v arc vectors of two components* 
(x,si , a-,.vs), and that the condition IF is satisfied by the sequences C„ , Usi , and 
Us s where f/ iV *, h — I, 2, contains the dements ,t „n . 

Lei 

— n K ~ Cjn*C jjth , 

j 

and lei 

'/ %nh " LZnh 

C‘nh 38 -- ~ i 

%nh 

where Ihe random variables /,„ h arc defined as were r' ln . 

Let 

_ —'* (.r »,v'i (x,.vs ir.vs) 

(ptsv ’ 

and suppose that Urn pv exists and is equal to p where p > — 1 + t, Then, the 
limiting distribution of Z ni and Z„t is bivariate normal with means 0, variances 1, 
and correlation coefficient p. 

Proof: To prove Theorem 3 we shall show that any linear function 
t\Z n i + tiZ n i will be normally distributed in the limit if U and k are not both 
zero. It will then follow 0 that the theorem is true. 

If we define t) N to be the sequence whose elements are 

. hOWl ~ Xtn) k(x t m — Xsi) 

X,n — "T/s + 1/2 , 

PlNl P2JVS 

then the arithmetic mean of OV is zero. Let 

~ Oj n Xjif , 

1 

and let 


K - 


Zn 

<Txn 


Then, it is readily verified that 


h Znl + kZ„ 


A n/j»l T 12 0«J 
<riiz nl +i,z nS 


* Tho generalisation holds for any finite number of components but, to simplfy the dis¬ 
cussion, is stated for two components only. The method used is due to H. CramCr, Random 
Vanables and Probability Distributions, Cambridge University Press, London, 1937, p. 105 
S H Cramer, Random Variables and Probability Distributions, Cambridge University 
Press, London, 1937, p 105 
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Consequently, to prove that kZ n[ + u_Z ni has a normal limiting distribution we 
need to verify that the sequence U N satisfies the condition W if U n and U N ' do 
The moments of U N are 

a 1 Y 1 -r 

VvN Tr 2-.i ‘Z'vtf ) 

iv v I 

so tlmi 

4*2 N — tl + U + 2 Llkp!) , 

where p N has the usual form of the correlation coefficient. Furthermore, using 
the binomial expansion, we have 


(4,4) 

where 




= Lc; 


^1^2 lLa,r—atf 
. «/ 2 (r~a )/2 3 

H-2N1 UlNl 


1 

= 2-j (%yNl ~ .Vwl) (%>,m ~ XmY " 


Then, by the Cauchy-Schwarz inequality we have 
Ct-'mi — tm) a (a;»,Y2 — &mY “ \ 

v 

< E (*« - %i) s ° • £ (*«, - % 2 ) 2r ” 2 f, 

V V 

so that 

I , | s 1/2 1/2 

I M«.t—« cJV I M2a,mM2r-2a,W2 , 

and using condition W for Um and Um , we have 

M2a,m H2T-2aN2 < 

Hence, substituting in (44) we see that 

sup | UrN | < =°. 

Hence the sequence Uir satisfies the condition W for all U and t 2 , and Theorem 
3 is proved. 

From Theorem 3, it then follows that the theorems on the limiting distribu¬ 
tions of moments, product moments and functions of moments 10 are valid for 
sampling from finite universes, at random without replacement, 

10 The moat important of these theorems are given in II Cramrfr, Mathematical Methods 
oj Statistics, Prineoton University Press, Princeton, 1940, sections 28 2-28 4, pp. 364-367 



A NON-PARAMETRIC TEST OF INDEPENDENCE 1 

By Wassily IIobffdinq 
Institute of Statistics, University of North Carolina 

1, Summary. A tost is proposed for the independence of Uvo random variables 
with continuous distribution function (d.f.). The lest is consistent with respect 
to the class £1" of d.f.’s with continuous joint and marginal probability densities 
(p.d.). The test statistic D depends only on the rank order of the observations. 
The mean and variance of D are given and \/n(I) — ED) is shown to have a 
normal limiting distribution for any parent distribution. In the case of inde- 
pendence this limiting distribution is degenerate, and nD has a non-normal 
limiting distribution whose characteristic function and eumulants are given, 
The exact distribution of D in the case of independence for samples of sizo 
n - 5, 0, 7 is tabulated. In the Appendix it is shown that there do not exist 
tests of independence based on ranks which aro unbiased on any significance 
level with respect to the class ft". It is also shown that if the parent distribution 
belongs to ft" and for some n > 5 the probabilities of the n! rank permutations 
are equal, the random variables are independent, 

2. Introduction. In a non-parametrie test of a statistical hypothesis we do 
not make any assumptions about the functional form of the population distribu¬ 
tion. A general theory of non-parametrie tests is not yet developed, and a 
satisfactory definition of “boat” non-parametrie testa does not seem to be avail¬ 
able Desirable properties of a "good" non-parametrie test are unbiasedness and 
consistency. A test of a hypothesis N 0 is said to be consistent with respect to a 
specified class of admissible hypotheses if the probability of accepting No tends 
to zero with increasing sample size whenever a hypothesis ^ No of this class 
is true. 

In this paper we consider the problem of testing the independence of two 
random variables X, Y on the basis of a random sample of size n. In all that 
follows the d.f. F(x, y ) of ( X, Y) is assumed to bo continuous. We will denote 
by ft' the class of continuous d.f.’s F(x, y) and by ft" the class of d.fria having 
continuous joint and marginal p.d.'s, 

/(*, y) = 3 s T(x, y)/dx dy, fi(x) » Jf( x, y) dy , fa(y) = Jf( x, y ) dx. 

The hypothesis No to be tested is that F(x, y) is of the form 
F(x,y) = F{x, »)F(co j2y ). 

Several tests of this hypothesis have been proposed. Among them those 
deserve particular attention which depend only on the rank order of the obser- 

1 Research, under a contract with the Office of Naval Research for development of multi¬ 
variate statistical theory. 
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vations. They will be referred to as rank tests The critical region of a rank 
test of independence with respect to the class SI' is similar to the sample space, 
the rank tests share this property with other tests obtained by the method of 
randomization (of. Schell6 [1]). A characteristic feature of a rank test is that it 
remains invariant under order preserving transformations of X or Y. 

Bank tests of independence have been studied by Hotelling and Pabst [2], 
Kendall [3] and Wolfowitz |4] While nothing is yet known about the power of 
the last test, the author 15] has shown that the two former tests arc asymptotically 
biased for certain alternatives belonging to O' By a slight modification of the 
examples given in [5] it can he shown that these tests are asymptotically biased 
even with respect to the class 0". 

In the Appendix it is shown that there do not exist rank tests of independence 
which are unbiased on any level of significance with respect to the classes 9! 
or Q" It will appear from this paper that there do exist rank tests of independ¬ 
ence which are consistent, and hence asymptotically unbiased, at least with 

respect to 0 . 


3. The Functional A(/ ; )- (liven a random sample from a population with a 
d f, belonging to a class Q, wc want to test the hypothesis Ii 0 that F is m a sub¬ 
class w of SI. It is easy to construct a consistent test of H 0 if there exist (a) a 
functional 0(F) defined for every F in 0 and such that 8(F) = 0 if and only if 
p c w - and (b) a consistent estimate of 8(F). There are many ways of devising 
by this method consistent tests of independence. The particular test described 
in the sequel has been chosen mainly for its relative simplicity. 

If F(x, y) is a bivariate d.f., let 

D(x, y) = F(x, y) - F(x, <*>)F(° o, y) 


A = A (F) = Jd*(x, y) dF(x, y). 


and 

(3.1) 

Here and in the following, when no domain of integration is indicated, the 
(Lebesgue-Stielties) integral is extended over the entire space (here IX). 

The random variables X, Y with the d.f. F(x, y) are independent if and only 

lf Theorem3°1. When F(x, y) belongs to 9", HF) = Qif andonly if D(x,y) - 0. 
Proof Evidently D(x, y) = 0 implies A(F) =0. 

Now suppose that D(x, y) * 0. Since F(x, y) is in O', the function d(x, y) 

f(x, y) ~ ji(x)Mv) is continuous. We have 


D(x, y) = a d(u, v) du dv. 


D(x, y) ^ 0 implies d(x, y) ^ 0. and since 

JJ d(x, y) dx dy = 0 , 
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there exists a rectangle Q m 1{« such that d(.v, //) > 0 if (x, y) Ik in Q. Hence 
D(.v, y) yi 0 almost eveiywhere in Q, and f(.r, ;/) > f) in Q. Thus 

A( F) > JJ lf(x, y) fix, y) <tx <ly > 0. 

This completes the proof, 

If F(x, y) is discontinuous, we can have A O’) = () and I)(. r, y) ^ 0. This is 
for instance, the case for the distribution 

p\x - o, r - i) - P(.v - i, r = oi = i 

The question remains open whether A = 0 implies D(x, y) m 0 if F(x, y) is 
continuous or absolutely continuous 
In Section 7 it will be shown that 

0 < A < ,' 0 

The upper bound 3 l „ is attained when /Or, ij) is the (continuous) d.f, of a 
random variable (AT, )’) such that X has any continuous d.f and Y = X (or, 
more generally, Y is a monotone function of A"). 

Let 



(3.2) '/'(*i, x s , x 3 ) = C(x, - ,r z ) - C(x, - x,), 

4>(xi , l/i ; • ■ ■ ,Xi, Vi) = JiKxx»-t' 2 , x,)<p(xi , x t , x 6 )<P(rji, yi, y t )i{ y,, y t , y 6 ). 
Then we can write 

(3.3) A = / ' ’' / ‘K** > Ih i ‘ • * i 'i'5» i/s) d/'X-C! ,2/0 • • • dT(K 6 , y 6 ), 

4. The Statistic I). Let (Xi, • • • , („Y„, T„) be a random sample from 

a population with the d.f. F(x, y), n > 5, and let, 


(4 1) D = D n = 


n(n - lj • • ■ (/i — -ij 


2" sKXi., I' 


*1 J 


, Y.,), 


where 2" denotes summation over all a such that 


«' = 1> ’ 1 * i w; o;i 5 ^ aj if r 5 ^ j, (i, j = 1, • • • , 5) 
Since the number of terms in 2 " is »i(» ~ 1 ) • - ■ (n - - 1 ), wo have by ( 3 , 3 ), 
(4.2) ED = A, 

Since in the case of independence ED = 0, D can assume both positive and 
negative values. It will be seen in Section 7 that —jV < D n < -%- s , the upper 
bound -jV being attained for every n, while the minimum of D n apparently in¬ 
creases with n. 
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The random variable D m deflued by (4.1) bclongo to the olios of D-etatet.es 
considered by the author [o). The following properties of D follow immediatelv 
from the results of that paper: 

I Let 


*(*'•*> •• ., Xas ,y tts ), 

**(**> * h ’”' = / /*fei. 2/i; • ‘ ^k,yi]X k+1 ,y k+1 - ■ Xh , yi ) 

dF[x k+1 , y L+l ) ■. dF(x s , yf), (k ~ 1, ■ ■ , 5)^ 

f» = / ' /1^4.2/., • ja^j/*) - A) 2 rf/'> lj2/l ) . iP( l4i y 4)> 

Then the variance of l) n is 

™ ™*-(; rscxi:i>* 

We have 

25 ft < 71 var A, < 5 ft , 

?i var 4),, is a decreasing function of n, and 
(4.4) lim n var D„ = 25 ft. 

n —♦ oo 

II. By Theorem 7 1, [5], f/ie random variable Vn(D„ - A) Aas a normal limit¬ 
ing distribution with mean zero and variance 25 ft. 

It will be seen in section C that in the case of independence ft = 0, so that 
the normal limiting distribution of V nD n is a degenerate one. In this case 
nD n has a non-normal limiting distribution (See section 8). 

6. Computation of D. From (4.1) and (3.2) we get after reduction 
i - 2(?i - 2)5 + (to - 2)(to - 3 )C 


(5.1) 
where 

(5.2) 


D = 


n 


.(to — 1)(to - 2) (to - 3) (to - 4) ’ 


n 

/I = y / Q/a((la ““ 1) ba(ba l)i 

0d1 

n 

Ji = 2 (#« ~ l)(6 a “ 1) ^ , 




(7 = 2 Ca(c« - 1), 

a=al 


and 
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a„ = ±C(X. ~ X 9 ) - 1, ba — t,C(Y tt - Y f ) - 1, 

/S^l P'l 

c n -£ <'(X a - X,)C{Y„ - l'„) - 1. 

a a + 1 and b a + 1 are. the ranks of X a and Y„ , respectively. c a is the number 
of sample members (Xp , Tp) for which both Xp < X„ and Yp < Y a . (Since 
Fix, y ) is continuous wc may assume that X a ?z Xp and Y a ?z Yp if a -?z j 3 .) 

Thus, to compute D for a given sample we have to determine the numbers 
a„ , b a , c a for each sample member, calculate A, B, C from (5.2) and insert 
them in (5.1). 


6, The variance of D in the case of independence. Since Fix, y) is assumed 
to be continuous, so arc F(x, ») and F(«, y). The inequalities x L < ,r 2 and 
F(x 1 , 00 ) < F{xi , to) are then equivalent unless F{x x ,<*>) = F(xi , »), The 
same is true of y x < y 2 and F («, y,) < F(&,yA. This shows that the function <j>, 
(3.2), does not change its value if Xi > is replaced by F(x ( , «>), £’( 00 ,*/,), except 
perhaps on a set of zero probability. Hence A and D are invariant under the 
transformation 


u = F(x, «), „ = F{ co, y); U * F{X, «), V - F(«, Y). 


In the case of independence we have F(x, y) ~ uv, and 

fa « jf • * • jT {<&«,, ;>i; ' • • ; u k) a *)} 2 <lu\ di<i • • • du k dv k , 

where $4 is defined as <5*, with x ,, 1 /; and F(x t , yd replaced by u,-, a, and u f y, 
respectively. On evaluation of these definite integrals we get 

ft = 0, 200 > 30 5 ft = l 000 • 30 5 ft - 

600 • 30 5 ft = H 1 . 120 ■ 30 2 ft = 12. 


On inserting these values in (4.3) we obtain 


( 6 . 1 ) 


var (30D) = 


2(n 2 +_5» - 32) 

Qn{n — l)(n — 3) (a — 4) - 


Another way to determine the coefficients fa in the case of independence is to 
compute var D„ for n = 5, 6, 7 from the exact distributions given in section 7, 
and lim n 2 var D n from the asymptotic distribution of nl) n (section 8). 

71 —* OO 


7. The exact distribution of D in the case of independence for n — 5, 6, 7. 
Let S ={(*!, 1 / 1 ), • • • , (x n , y n )) be a sample from a population with a continu¬ 
ous d.f. We may confine ourselves to samples with x, ?z xj and y, ?= y { if 
% fZ j. Let (xi , yp,), ,(*'», y'p n ) be a rearrangement of ,y 1 ), • ■ , ( r„ , y n ) 

such that < xi < * • • < x'„ and y[ <))[<•■• < y n . The permutation 
!! = (&,■•■ ,(3n)of (1, ■ ■ ,n) will be referred to as the ranking of the sample S. 
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where 2 ; stands for summation over all a such that 1 < ^ < a , < ... < tt6 < n 
Denoting by II W the permutation obtained from n = (A, • • •, fi n ) by omit¬ 
ting A, rve have the recursion formula 


(7.2) mDn(n) = (n - 5) £ . 

1-1 

From (4.1) and (3.2) we obtain 

60A(A j ■ ■ ■ j A) — i A , A)i Hfit i A , A) + i P(Pi , A , AWA , A , a) 
or 

0 if A ^ 3; 

(7.3) 60D C (A j ' 1 ■ , A) = ‘ 2 if A = 3 and A»A < 3 or A, A > 3; 

.-1 if A “ 3 and A < 3, A > 3 or A > 3, A < 3. 

We have 

(7.4) DM, , A) = Z>„(A, A, A, , A) 

= D n (Pi , • • , A-2 > A j A—i) = D n (fi n , fin-i i • ' j A) 

For ft = 5 this follows from (7.3) and for general n from (7.1). 

Also, by the symmetry of D„ with respect to x and y, -D„ does not change its 
value if in the permutation (At" - , A) the numbers 1, 2 or n — 1, n are inter¬ 
changed or the permutation is replaced by its inverse 
In the case of independence all ft! rankings have the same probability 1/ft 1 . 
To find the distribution of D„ we have to determine the number of rankings 
giving rise to particular values of D,. 

If ft = 5 there are 5! = 120 rankings. Owing to (7.4) we need consider only 
those with A < A , A < ft , A < A ■ Their number is 3 ! 1 = 15- Among 
them those with A ^ 3 yield D 5 = 0; this leaves only the three permutations 

(1,2, 3, 4, 5), (1,4, 3, 2, 5), (1, 5, 3, 2, 4). 

By (7.3) the respective values of 60A are 2, —1, -1 Thus we have 

P{60D 6 = 2} = ft, P{60D 5 = -1} = ft, 

?{60D» = 0} = if. 



552 


W'A.H.SIM tlOEFFDIXG 


The distribution of D e , lh , ■ ■ ■ can bo obtained in a similar way using the 
relations (7.1) to (7.4). The distribution of I) n for n = 5, 0, 7 is given in 
Table I. 

Fiom (7 3) and (7.1) it follows that —,'a < I) n < sV for n = 5, 6 , ••• . 
The upper bound is attained for II = (1, 2, • • • , a) and every n. To judge 
by the cases n - 5, (5,7, the minimum of D„ apparently increases with n. From 
ED n = A it also follows that A < -jV 


8. The Asymptotic Distribution of n7)« in the Case of Independence. 

Theorem 81. If F(x, y) = F(x, «= )/-’(», y) awl F(.r„ =o ) arul F («, y) are con¬ 
tinuous, the random variable nD n + A has a limiting distribution whose charac¬ 
teristic function (c.f.) is 


( 8 . 1 ) 



2 ft V**” 

fr a irV 


where r(k) is the number of divisors of k. 

Note that r(k) is the number of divisors of k including 1 and k. Thus r(l) = 1, 
r(2) = 2, r(3) = 2, r(4) = 3, • • •. 

The author has not been able to bring the d.f. corresponding to the c.f. g(l) 
into a form suitable for numerical computation. Thus Theorem 8.1 may be 
considered as a preliminaiy result. For this reason only a brief indication of 
the proof is given here. 

If (Xi, Yi), • • • , (X« , Y n ) is a random sample from a population with d.f. 
F( x, oo y ), let nB n (x, y) be the number of sample members (Ah , Yi) such 
that Xi < x,Y, < y. S n (x, y) is a d.f, depending on the random sample. If 
we put F(x, y) = S n (x, y) in A(F) as defined by (3.3), we, get 


A(S„) - i t, • • • t, *(*«,, r«,; • ■ • i X. ( , X"). 


W «,»1 


®s-l 


It is easy to prove that if n{A(S„) - Ek(S„)\ has a limiting distribution, it is 
the same as that of nD n . 

Now it can be shown that nA($„) has a limiting distribution with the c.f. (8.1). 
This can be done either analogously to Smirnoff’s [6] derivation of the limiting 
distribution of the goodness of fit statistic , or applying von Mises’ [7] general 
results on the asymptotic distribution of a differentiable statistical function. 
Though the latter paper deals only with univariate distributions, its results can 
be extended to the multivariate case, 

By expanding log g{t) in powers of il we obtain for the j-th cuinulant kj 


s, 


2 6j - 3 (i - 1)1 

KOT 


Bli- 


i > 


where are Bernoulli’s numbers, 


Bi — £, Bi = Bi — 3rj, 


Bi = -jV, • • ■ ■ 
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In particular k, = A> ancl since ED„ = 0, the limiting distribution of nA(S„) 
is that of nD n + nV- 


9. The D-test of Independence. Given a random sample from a bivariate 
population with continuous d.f., a test for independence can now be carried out 
as follows- 

If a(0 < a < 1) is the desired level of significance, let p„ be the smallest number 
satisfying the inequality 

P{D n > p n j F i«) < a, 


where a is the class of d f’s of the form F(x, »)F(®, y). 

Compute D» as shown in section 5. Reject the hypothesis Ila of independence 

if and only if D„ > p n ■ 

For n = 5) 6 , 7 the numbers p n can be obtained from Table I. 

From Tchebychef’s inequality and (0 1) we have 


p| 30D„ > 


t4 


2(r 2 -j- 5ri - 32) 


Hence 


l)tt(n — l)(n — 3 )(r — 4) a 

, n „ < ,/ ~~2(^+5a-~32 T~~_ 
,iUPn - y l)n(» - 1)(» - 3)(n - 4)a' 


< a. 


It follows that p„ - 0(n l ). 

If A > 0, we have A - p„ > 0 for sufficiently large n. Then 

P\D n > Pn) > P[ |D„ - A | < A - p») > 1 - (var D„)/(A - P „) 2 

By ( 4 . 4 ) the right hand side tends to 1 . 

This, together with Theorem 3.1, shows that the D-test is consistent with 

respect to the class fi". „ , 

Since P!D„ < 0) tends to 0 if A > 0, it is safe not to reject whenever 

< 0 An inspection of Table I shows that at least for small n this will 

happen in more than one-half of the cases if Ho is true. 


10. Concluding Remarks. It would be interesting to compare the power of 
the D-test with that of other tests with respect to particular alternatives, for 
instance with the product moment correlation test when the population is norma 
with correlation p. A preliminary investigation seems to indicate that for sma 
values of 1 p I and n -* * the power efficiency of the D-test as compared with the 
common Lisrathe,to. Th,sresult 
for values of n which are of practical interest. On the othe “* 

expected that a test which is consent with respect to aJ large> class o 
will have a lower power with regard to a sub-class of alternatives than a test 
which has optimum properties with respect to this particular JOm 

considerations suggest the problem of selecting from a gl ven class of non para 
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metric tests (such as those consistent with respect to fl") a test which is most 
powerful with respect to certain parametric alternatives (such as normal dis¬ 
tributions). 


TABLE I 

The distribution of D n in the case of independence for n = 5, 6, 7. 
n = 5 n = 7 


X 

15P{0O0 S » x\ 

P|8O0» ^ a:} 

X 

030P{ 1200/b *» *| 

P{ 12000 , > 

-1 

2 

1.0000 

-11 

8 

1.0000 

0 

12 

0.8667 

-8 

32 

0.9873 

2 

1 

0.0667 

-7 

32 

0.9365 



' - “ 

-6 

8 

0.8857 




-5 

28 

0.8730 




-4 

88 

0.8286 




-3 

64 

0.6889 




-2 

56 

0.5873 


71 = 6 


-1 

8 

0.4984 

X 

90/ J | 180/Jo “ •*! 

p\ iso 0 , > x\ 

0 

88 

0.4857 




2 

77 

0.3460 




3 

24 

0.2238 

-2 

4 

1.0000 

4 

4 

0.1857 

-1 

28 

0.9556 

0 

50 

0.1794 

0 

36 

0.6444 

8 

8 

0.0905 

1 

16 

0.2444 

9 

4 

0.0778 

2 

1 

0.0067 

12 

24 

0.0714 

3 

4 

0.0556 

14 

2 

0.0333 

6 

1 

0.0111 

18 

12 

0.0302 

--- 

--- 


24 

2 

0.0111 




30 

4 

0.0079 




42 

1 

0.0016 


APPENDIX 

A. Equiprobable rankings and independence. Let n„„, (y = 1, 2, • ■ ■ , n!) 
bo the n! possible rankings of samples of sizo n from a bivariate population with 
continuous d.f. F{x, y) (of. section 7). 

If F(x, y) = F(x, co y) wo have 

(Al) .P{IL„,) = 1/nl (v - 1, ,al) 

for every n. 

Does (Al) for some particular n imply independence? This is not true for 
n = 2 . In this case (Al) is equivalent to P((l, 2 ) j = If the distribution 
has a p.d. f(x, y), we have 
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/ “ r r f 1 rv .« 


/(«, v)dudv 


fix, y) dxdy, 


which equals f whenever /(a;, y) = /(—a;, y). However, we have the following 
theorem: 

Theorem, f/ Pfo 2/) 2S w and (Al) holds for some n > 5, then 
(A2) y) = co )Fi oa ,y)- 

Proof. (4.2) can bo written m the form 

nl 

(A3) ZA.(n„,P){n n ,} = a. 

' V<“1 


If (Al) holds, the left hand side of (A3) has the same value as when (A2) is true. 
But in the latter case we have A = 0. Hence (Al) implies A = 0 By Theorem 
3 1 this is sufficient for (A2). The proof is complete. 


B. Non-existence of unbiased rank tests of independence. 

Theorem. There do not exist rank tests of independence which are unbiased on 
any significance level with respect to the classes O' or 0". 

Proof: Lot II„ P have the meaning of Appendix A. Any critical region of a 
rank test of independence is a set S m = (II,,,, , • • , IUJ of m rankings In 

the case of independence P(S m ) = P{U m e S m ] = m/rri We may confine 

ouisclves to significance levels ni/nl, m = 1, 2, • • , n! - 1. To prove the 

theorem it is sufficient to show that for every n = 2, 3, ■ • • , for some 

m(l < m < nl — 1) and every S n there exists a d.f. F m Q," such that 

P(S m | F) < m/n\. 

We shall prove the slightly more general proposition that this holds for 


m = 1, 2, 3. 

Let the bivariate distribution A n be such that the probability mass is dis¬ 
tributed uniformly on the n — 1 segments 


(Bl) 



< a; < 



n — 2k 

y ~ X = nLLT ’ 
(k 


1, 2, ■• • , n l), 


and is zero in any region not containing a part of these segments. 

Let B„ bo the distribution which is uniform on the n - 1 segments 


(B2) 


k - J 
n — 1 


< x < 



x + y 


2 k - 1 
n — 1 ’ 


(fc = 1, 2, • • •, ti - 1), 


and zero elsewhere. 
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The d.f’s of both A n and Ii„ aie continuous, with 

P(r, ~ F(x>, .r) - x (0 < x < 1). 

Since the probability of (.V, Y) lying on any one of the .segments (Bl) or (B2) 
is l/(n - 1), the. probabilities P(U/A„) and P(U/B„) are easily obtained in 
terms of the mullinominal distribution with n — 1 equal probabilities. In 
particular, we have 

(B3) P(l, 2, ■ • ■, n I Aj) « 1; P(n, n — 1, • • •, 112?,) « l, 

P(l, 2, • • ■ , n | A„) = P(n, n - 1, • • ■ , 11 B n ) = ( n - 1) (—)’ 

m / i Y" 1 

v»-v ’ 

P(n,n~ 11 -4„) = P(l, 2, ■■■ > n\B n ) = 0. 

In general, if II* is any permutation of 1, ■ ■ • , n, we have either P(II„ | A n ) = 0 
or P(H„ | B„) = 0. For any II„ with P(II„ | .4,,) ^ 0 contains at least one 

“run up” of 2 or more numbers (a sequence of consecutive numbers 

+ , i + />•) which is not preceded by smaller numbers oi followed by 

larger numbers On the other hand, if a III with P{ll' n | B„) 0 contains a 

“run up”, it is either preceded by smaller numbers or followed by larger numbers 
Hence if P(II„ | A n ) ^ 0, then P(H„ | B n ) =• 0. Similarly, P(n n | B n ) ^ 0 
implies P(lf„ | d„) = 0 

From (B3) it follows that for any sot Bn of m rankings which does not include 
(1, 2, ■ • , 7i) or (n, n — ],•••, 1) we have, either P(B m | j4 a ) = 0 or 
P(S m | P 2 ) = 0. Hence we need only consider critical regions containing both 
(1,2, ■ ■ , n) and (n, n — 1, • • , 1). For m = 1 there are no such regions For 
m = 2 there is just one. But from (B4) it follows that for n > 2, 

P(l» 2, ■ ■ 1 fl|4,)+P(n,n - 1, •••, 1 |d„) 

= pjf V/ ir.i 

\n — 1/ - ]) nl‘ 

Finally, if n„ is any permutation other than (1, 2, • • • , n) or (n, n - 1, • • • , 1), 
we have, by the preceding arguments, either for A n or for B n , 

P(l, 2, ■ ■ ■, n) 4- Pin, n~ 1, --*,!) + P(n n ) - ( 1 -Y * < \ . 

\n — 1/ n! 

^ This completes the proof for d.f.’s in fi'. To prove the theorem for d.f.’s in 
O we can replace the distributions A„ and B n by distributions A' n and B' n having 
continuous joint and marginal densities and such that the probabilities P(n j A' n ) 
and P(n | BY) differ as little as we please from P(H | A n ) and P(n | B n ), respec¬ 
tively. For instance, A 2 can be defined by the continuous density 
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/(*, y) = K (* -y + ti 
- K(e - x + 3/) 
« K[x + y - «) 


if 0 < y - x < t, 
if -«< 3/ - x < 0, 
if x + y > e, 


* < 1 “ t, y > «; 

* > «, y < 1 - e; 

* < «, y < <; 


= If (2 - e - a: - ») if 


* + 0<2-e,®>l 


- o elsewhere, 

where Iv * 3/(3e 2 - 4c 3 ) and 0 < t < If e is taken sufficiently small, the 
distribution satisfies the requirements. The details are left to the reader. 

The proof also shows the non-existence of an unbiased rank test of inde¬ 
pendence for n ~ 2 and any level of significance (for we need consider only one 
level 4). ^ also can be shown that for n = 3, any m = 1,2, - •, 5 and any 
5 the inequality P(S m ) < m/3! holds for at least one of the distributions 
A, Uj, JBj. The question remains open whether there exist rank tests of 
independence which are unbiased for some sample sizes n and some significance 
levels m/n!. 
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ON PREDICTION IN STATIONARY TIME SERIES 
By Herman 0. A. Wold 
Uppsala University 

Summary. In time series analysis there are two lines of approach, here called 
the functional and the stochastic. In the former ease, the given time series is 
interpreted as a mathematical function, in the latter case as a random specimen 
out of a universe of mathematical functions. The close relation between the 
two approaches is in section 2 shown to amount to a genuine isomorphism 
Considering the problem of prediction from this viewpoint, the author gives in 
sections 3-4 the functional equivalence of his earlier theorem on the decom¬ 
position of a stationary stochastic process with a discrete time parameter (see [9], 
theorem 7). In section 5 the decomposition theorem is applied to the problem 
of linear prediction. Finally in section 6 a few comments are made. Since 
various aspects of the isomorphism in question are known, this paper might be 
regarded as essentially expository. 

1. Introductory. Let the sequence 

(1) 1 ■ ■ j ®i—i > *< , •?<+1 ‘ ■ ■ 

be an empirical time series such that no clear trend is present in the average 
level, in the variance or in any other structural properties of the series which we 
might choose to consider. Such series are usually called stationary as distinct 
from evolutive, terms which of course are somewhat loose when referring to 
empirical data. We shall consider two approaches in the theoretical analysis of 
stationary series. It is convenient to allow xt to be complex; the conjugate 
complex of x t is denoted it . 

In the functional approach, the sequence (1) is regarded as forming an infinite 
sequence, say [xt] , where t runs from - oo to + ». To define stationarity, let 
us for any infinite sequence {z<} write 

(2) M\zi\ = lim - -J—T--- X) Zt (4 “ “> 4 + “)• 

h ~ h T i (»<i 

The limit M[z], which will be called "the average of z ”, is clearly independent 
of i. It is also seen that a necessary and sufficient condition for M[z t ] to exist is 
that the same average should be obtained when 4 is kopt fixed while 4 —>> + », 
and when 4 is kopt fixed while 4 —* ~ “ ■ The stationarity of the sequence (1) 
may now be brought out by assumptions of the typo that the averages M[x] and 
M[xrx t+ k] exist, say 

(3) M\xi] = m, M[xr£i+k] = n, (k = 0, dfcl, ±2, • •)■ 

In the stochastic (or probabilistic) approach, we introduce an infinite sequence 
of random variables, say 

W ‘ ' j £<-i j ) £f+i j ' 1 • 
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ON PREDICTION 


559 


or briefly (f .1! • The sequence {may be regarded as the generalization of the 
nofaon of multi-dimensional variable, say &, ... , fn ], to an infinite n 

components f, - According to a basic theorem by A. Kolmogoroff (see e g [91 
§11), the probability distribution of the sequence {{ ( | maybe defined by specify¬ 
ing for any finite set of variables, say [£,,, ... , its multi-dimensional dis¬ 
tribution function, say 


(5) F(u h ■■■ , u n ; k ,•••,£„) = Prob (| (l < «i, • • • , f,. < U|t ). 

The sequence {£,} thus defined is said to constitute a stochastic process As is 
sufficient for our purpose, we confine ourselves to the case when the time parame¬ 
ter t is restricted to discrete values, t = 0 , ±1 ± 2 , ■ •. 



Now in the stochastic approach, the empirical time series (1) is regarded as a 
sample specimen, a realization, of the stochastic process {{<}, just as a point 
[xi, ■ • • , *„) in an n-dimensional space may be regarded as a sample specimen 
of a multidimensional variable l£i , • • • , £»]. In line with this interpretation, the 
process {{() may be regarded as a universe of individual realizations such as ( 1 ) 
(soo the graph). Taking out a realization at random from this universe, we shall 
have the probability, 

I 1 '(ui ; ti) = Prob (£<i < Mi), 

that the value taken on by the realization at the time point h will be <in ; 
similarly, 

F(th ,Ut]k, t%) - Prob (fc, < Ui , h, < ui), 
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is the joint probability that the values taken on by the realization at k and < 2 
will be <ui and <w 2 respectively. 

Any expectation referring to the, variables (4) may be expressed in terms of the 
distribution functions (5), for instance 

EM - [ ud„F(u\t), #[&,■£,,] * f f u vdi, v F(u,v;h,k). 

Again interpreting in terms of the universe of realizations, EM, say, is the aver¬ 
age, over this universe, of the value taken by the realizations at the time point t. 

The above definition of a stochastic process (4) being perfectly general, we have 
to impose special assumptions if we wish to take into account particular proper¬ 
ties of the given time series (1). Thus stationarity of the process (4) may be 
defined by assuming that any probability of the type (5) will remain the same 
if h , • ■ ■ , t„ is replaced by k + t, • ■ • , /„ + i, where l is arbitrary. Alternatively, 
and more generally, the stationarity of the sequence (1) may bo brought out in 
this approach by assuming that the expectations 

EM = a, EMh +k \ = p* 

exist and arc independent of t. 

2 . The functional and stochastic approaches are closely related as to problem'’ 
and results. A typical example is that r* and p* as defined above allow the 
representations 1 

(6) n = f c ,tt dF(\), p t « f t»(X), (k - 0, ±1, ±2, • • •), 

J-7T J—T 

where F(\) and 4>(X) are real, bounded and never decreasing functions. We 
shall now show that the parallelism between the two approaches amounts to a 
mathematical isomorphism. On the one hand, we recall that A. Kolmogoroff 
[3], [4] has introduced and studied the notion of a stationary sequence in Hilbert 
space,—let such a sequence be denoted {X(}—, and shown that a stationary 
stochastic process {&} forms a particular realization of this general, abstract 
(XY). On the other hand the following elementary lemma shows that another 
realization of {X ( j may be formed on the basis of a stationary sequence {an) 
such as (1). 

Lemma. Let {an} be a sequence of type (1) which satisfies the conditions (3) but 
is arbitrary in other respects. We write 

(7) {*<} = ,x t -i ,x,, x, + i , ■■■ , 

where x t = {an}, and x t+k is obtained from, x t by replacing an by an+t for every t. 

1 Ab to r*, see N. Wiener [8], who treats the ease ol a continuous time parameter (. 
As top*, see H Wold [9], p. 66, and A. Kolmogoroff [4 ], p. 5 
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For the elements Xt , let multiplication by a real or complex constant and addition 
he defined by 

aXi = \axt}, *1 + yi = jaij 4 . y t ) ; 

and let R be the. class formed by all dements of the type 

C- n X |-„ + c_„ nXl-n+1 + ■ ■ • + CoXt + ••■+■ C„Xi- n , 

where n and c_„ , • • • , c„ are arbitrary. Let the inner product (x t , y,) of two 
elements x t — (xi ), yi = (i//| in R be defined by 

(x t , y t ) = M[xry t ], 

and let R' be the closure of R 

Then R' is a space the dimension of which is denumerable or finite. In the 
former case, R' satisfies the conditions of a Hilbert space H, in the latter case it can 
be extended to a Hilbert space II. In any case, the relations 

(8) Uxi = .*14-1 , — 00 < t < -f- 00 , 

define a unitary transformation U in H. 

The first statement of the theorem is obvious. It is also easily verified that 
R 1 satisfies the conditions A-C of an abstract Hilbert space as defined by 
B. v Sz. Nagy [7]. If R' is of finite dimension, a suitable extension will make R' 
satisfy the conditions A-E of a Hilbert space as defined by M. II. Stone [6]. 
The transformation U is clearly unitary, it is also plain that the definition (8) 
of U extends to the whole of II. 

Now since both (4) and (7) arc particular realizations of a stationary sequence 
(A;) in Hilbert space, any theorem on such a sequence (AY( will give, as imme¬ 
diate corollaries, similar theorems on a stationary sequence \xt } of type ( 1 ) and 
on a stationary stochastic process {£<)■ Generally speaking, the former corol¬ 
lary will involve averages of one or more functional sequences {zi}, {yi\, • 
over time t , while the latter will involve averages, for fixed f, over the realizations 
of one or more stochastic processes {Si}, { 2 / 4 j ' ‘' • 

Let us consider the following problem of prediction in the light of the iso¬ 
morphism established: Suppose the data (1) are known up to f — 1, say for 
t - 1 , t — 2, ■ • • , t — n, what can then he said about x t , or, more generally, 
about £(+*? One approach to the problem is to apply harmonic analysis to the 
given data, and to extrapolate the function obtained up to the time point t + k 
Another approach, the one which we shall consider, is to approximate an-ft 
directly in terms of the given data Confining ourselves to linear prediction, 
and making use of n observations, the prediction formula will then be 

(9) pred. an+j, = ao" ,W + a[ n ' k) Xt~\ + <4 71 " H x 1-2 + • ■ + <^ 1 . 

The error of prediction, also called the residual, is denoted 

(10) y\+i = xi+k ~ P^d x l+k . 
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Considering first the functional approach, we apply formula (9) for all t, 
thus obtaining the residuals 

‘ 1 • , Vt-i , Vt i ,Vi+1 > 

In this approach we are led to regard the residual variance, i.e. 

(11) Aft I v\" k) PL 

as a total measure of the accuracy of the prediction. If wc follow the stochastic 
approach, on the other hand, the formula (9) is applied, for fixed t, to all realiza¬ 
tions {z<| of the process (f ( |. In this case, the variance expectation, 

(12) m v ( r k) i 2 j, 

is regarded as a total measure of the accuracy of the prediction. The prediction 
coefficients arc determined by minimizing the expressions (11) and (12), 
respectively. 2 It needs no further comment that the two lines of approach in 
prediction theory will, thanks to the isomorphism indicated, lead to parallel 
results, 

In a study of stationary stochastic processes, the author has earlier found a 
decomposition theorem which has a direct hearing on the prediction problem 
(seo [9], theorem 7), The main purpose of the present note is to develop the 
corresponding decomposition for a functional sequence of the type (1). Two 
theorems on this line are given in sections 3-4. The proofs arc briefly indicated; 
for further details, the reader is referred to my treatment on the stationary 
process [9]. In section 5, the decomposition is applied to the prediction problem. 
A few comments follow in section 0. 

3, Auto-regression analysis of stationary time series, Let (a,';} be an infinite 
sequence (1) such that the conditions (3) aro fulfilled. By (9)—(10), the resid¬ 
uals j/ l i n,0) will be well-defined for every n and t. According to elementary 
properties of least square residuals, we have 

(13) Af[?/( n ' 0, l = 0; M .*mJ = 0 for k = 1, 2, • • •, n. 

Since the minimum variance cannot increase if we replace n by n + 1, we further 
have 

M[ | *, 1 2 J > Af[ | j 2 ] > jlf [ | y\ n ' H ' 0) | 2 ] > 0. 

Making n -»«, we infer that there is a constant d 3 such that 

iim M[| 2 / ( ( n ’ 0> | 2 ] = d s > 0. 


2 For real sequences (a;,) and {{,}, this minimization is, of course, nothing else than the 
method of least squares. 
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Makmg dsc of the Gram-Schmidt orthogonalization procedure it is further 
possible to show that there exists a sequence {y t \ such that ’ * ™ 

- y t I 2 ] = 0. 


hm 


In the usual terminology, the sequence {y t } i s the limit in the 
quence [yfi I, 


mean of the se- 


( 14 ) hi .in. (* 




, vr\ 


Vt +1 i 


•) = 


2/i-i > Vi , J/t+i, ■ • • . 


Wo may remark that (14) does not necessarily imply that y[ n) mil for a fixed 
t have y t for an ordinary limit We also note that the limiting sequence U,A 
is not uniquely determined, for instance, the relation (14) remains valid if a 
finite number of the elements y t are modified 
As is easily shown, we have 

(15) hm M[ f] = M[\y t l a J = M[y r x t ] = d 2 > 0 

and [cl. (13)] 

(Hi) = 0, 7c = 1, 2, ••• , 

Moreover, the sequence [y t \ is non-autocorrelated, i.e. 

(17) ^T"[7/(f)(+d = 0| h — db 1, ±2, • ■ , 

In fact, observing that 

M\yiy t +k\ = lim fc = 1, 2, • ■ •, 


and supposing that (17) is not true, we would have 
(18) \M[y l r 0) -f t ^}\> a>0, 

as v runs through some sequence %, n 2 , • • • , such that n, -> ». The relation 

(18) , however, would imply 

(19) H[\V { : M “cvfll | 2 ]<d 5 (l-W) 

for some sufficiently largo r and for some suitable c. Since - c y[-f is a 
linear expression of the type appearing in the right hand member of (9), the 
relation (19) is incompatible with (15). Thus (18) is not possible and (17) must 
hold good. 

Part of tho above analysis is summed up in 

Tueohkm 1, Given a time series (# 1 ) which satisfies (3), let t > 0 he arbitrary. 
Then an integer n and a set of coefficients a;"' 01 exist for which (9) defines a residual 
senes {i// l,0) } such that 

M[y\" ■ 0) ] = 0, | M[y-y^U < * h = ±1, ±2, • ■ • . 
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4, A decomposition theorem. We shall first consider the special case where 
(15) gives 

(20) M{ | y, | 2 ] = d* = 0 , 

which is the same as 


I i,m. (• 


2/i-i j 


tl 


c».o) 


•) = (■•■, 0 , 0 , 


In this case wo shall say that the sequence [xi] is deterministic, 2 the interpreta¬ 
tion of this term lining as follows: Given the sequence {.?<) for all time points up 
to and including t - 1, we may, by the use of a finite number of the given values, 
predict aq+A with any accuracy; i.c., with a residual error of arbitrarily small 
variance This can be shown by induction. In fact, suppose that we are able 
to predict each of :r,, ■ • ■ , x t +k~\ in such a way that the prediction error has a 
variance < e, where < us arbitrarily prescribed. Letting 8 > 0 be arbitrary, we 
can then find a formula of type (9) which predicts x,+t in terms of the exact 
values Xi+i-i, %ui.-i, ■■ and which gives a residual variance 5/{k 4- 1). 
Replacing here .ti+i-i, • • • , Xt by values so predicted that the residual variances 
are, less than 8/(k -j- 1) | a{"' 0) S/(k + 1) j |, it is seen that the total 
error of (9) will have a variance < 8. 

We proceed to the general case, d i > 0. According to the above analysis, 
y, is that part of which cannot be linearly predicted from the previous observa¬ 
tions , xt-i , ■ ■ ’ . In other words, each time point t brings in an unpredict¬ 
able, random-lilce element ?/< in the series {x ( j. Now while from (16) y t is 
uncorrelated with the previous observations .r ( _L, Xi _ 2 , ■ • , it will in general be 
correlated with the future observations ,t (+1 , z l+ i, • • ■ . Thus the unpre¬ 
dictable element y t may be regarded as influencing the future development 
Xi n , x i+ 2 , • • • of the series (a' ( j. In order to examine this influence we proceed 
as follows. 

We approximate x t linearly in terms of y t , y t -i, ■ ■ • , y,_„ , writing 

Xt — boy i + btyt-i + •' • + b„yt~n + R/" 5 = 2i n) + w< n> - 
Determining the coefficients 6* by minimizing 

the coefficients b k will thanlcs to (16)—(17) be independent of n, We obtain 

bo - 1; k = M[xr‘St-k}/d\ k = 1, 2, > • ■ . 

The sequence [z ^} thus being determined for every n, it is further easily shown 
that [z\ n) \ converges m the mean, say to {zi\, 

(21) hi.m. ( •■•) = (-.. )Z ,_ 1)2j) 


* The term is due to J Doob [1 ], in my study [0] I used the term singular. 
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We may thus write 

zt = Vt + 2 + ■ , 

where the sum converges m the mean. Finally, we write 

(22) Xi = z t + u t , 

which gives a decomposition of the series {x t \ into two components {«,) and 

M- 

In the decomposition (22) the component zi is that part of Xt which is linearily 
built up by the unpredictable elements {y t J up to and including the time point 
l From (17) we know that the sequence {y t } is non-autocorrelated. It can 
fuithei be shown that the square modulus sum of the coefficients bk is convergent, 

£ IM 2 < “ ■ 

fc -0 

As to the component u, , it can be shown that {u t \ is deterministic. More 
precisely, we have 

l.i.m. {u t - (at' 0) + a) T, ' 0) M i _ 1 + •■• + ai n '%_„)} = (0) 

n->oo 

where the aS n ' 0 ’ aie the same as the minimizing coefficients of (9). It can further 
be shown that u t is uncorrelated with y l+K and z, +k for all k, 

M[u t yt+k] = M[u t z l+k ] = 0, (k = 0, ±1, ±2, •• •)• 

Summing up the above results, we obtain 

Theorem 2 Any time series {au} which satisfies the conditions (3) allows the 
decomposition 

(23) {at} = {z t + ui}, 
with 

{z t } = l.i m. [y t + biyt-i + b^yt-i + • • • + bnyi-n), 

n—*<x> 

where the series {y t }, {z t } and {u t } have the following properties 

A. The elements y t , z t and u t are obtained from x t , at-i, ■ • ■ by the limit for¬ 
mulae (14), (21) and (22). 

B. The series { y t } has zero mean, 

M[y t ) = 0,' 

is non-autocorrelated, 

M[ytyt+k\ = 0. fc = ±1, ±2, • • • , 
and is uncorrelated with [at_i}, {xt-2}, ••• , 

M[y t -xt- h ] = 0, k = 1, 2, ■■ • . 
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C, The series {«<) is uncorrclatcd with [yt] and [z t j, 

M\u$ji-u,\ = M[u t zi+k\ ~ 0 , (k — 0, ± 1 , ± 2 , • *■). 

D. The series {u ( j is deterministic. 

6. Application to the problem of prediction. In section 1 we have considered 
the problem of predicting linearly in terms of x ( ~i , x t ~i, ■ • ■ . Now it is 
seen that theorem 2 gives the following formula for predicting eh * with an error 
of minimal variance, 

pred. xt+k ~ Ut+k + bk+iyi-i + bk+iVt-i + • • • . 

In fact, by theorem 2, A and D, the right-hand member can be calculated with 
any prescribed accuracy from a finite set of observations an_i, Xi-i , • • •, x t -n, 
where N of course depends on the accuracy desired; on the other hand, the 
prediction error being 

yi+k + hyt+k-i + • • ■ + hyt, 

wc infer from theorem 2 (B) that this error is of minimal variance, 

M[ | Xi+k — pred an+n. | J ] = (1 + | h 1 2 + • ■ • + | b k 1 2 )d a 

6. Comments. As mentioned in section 2, the above theorem 2 is the analogue 
of a theorem on the decomposition of a stationary stochastic process given by the 
author previously (see [9], theorem 7), The starting point is then to apply 
formula (9), not as above to the same sequence {*<} for varying t, but to all 
realizations fxj) of the process, holding l fixed. Tho close connection between 
the decomposition in the two approaches is further brought out by the following 
theorem. 

Theorem 3, Given a stochastic process, 

•••>£(* - 1),€(0,i(H- !),•■•> 

which is stationary in the sense of (5), let {xt} he an individual realization of this 
process Then (ait) will with probability 1 allow the decomposition of theorem 2. 

In fact, according to the ergodic theorem of Birkhoff-Khintchine,' 1 the averages 
(2) will exist with probability 1, and so theorem 3 follows from theorem 2. It 
should be observed that the coefficients h. will in general vary from one realiza¬ 
tion to another. 

The theory of the decomposition (23) 1ms been carried further in a brilliant 
study by A. Kolmogoroff [3]. His analysis deals with the general case of a 
stationary sequence ina Hilbertspace. Establishing a decomposition of type (23) 

* See A. Kolmogoroff [2], His proof refers to averages (2) of the speoial type where 
(i is hold fixed while ( 2 —> ®. According to the stationanty, however, the average exists, 
and is the same, when is fixed and f t —» — », and so the general average (2) will likewise 
exist. 
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for such sequences Kolmogoroff also shows that the decomposition is uniquely 
determined by properties corresponding to A-D. Making use of the poweiful 
methods of spectial analysis of linear transformations in Hilbert space) Kolmo¬ 
goroff further presents a highly developed theory of the decomposition. 

As immediate corollaries of this general theory Kolmogoroff [4] obtains corre¬ 
sponding results for a stationary stochastic process {{,} such as (4) Now thanks 
to our lemma in section 2, similar theorems hold good for the functional sequence 
(1). These results include detailed theorems on the connection between the 
decomposition (23) and, on the other hand, the function F (X) which by (6) 
generates the coefficients n . For example, it turns out that {x,} is completely 
deterministic if the derivative F (X) is constant over an interval of positive 
measure An explicit formula for the coefficients b k in terms of the function 
F(X) may also be obtained. For proofs and further results, we must refer to 
Kolmogoroff’s papers [3]—[4]. 

The theory of the decomposition (23) has later been generalized in various 
directions. V. Zasuhin [11] and J. Doob [1] have shown that the decomposition 
applies to multi-dimensional stationary sequences. As shown by the present 
author [10], the decomposition may be employed for the analysis of linear equa¬ 
tion systems with an infinite number of unknowns. This device makes use of 
the decomposition of non-stationary sequences, a generalization indicated also 
by M. Lofcve [5], 
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GENERALIZATION TO N DIMENSIONS OF INEQUALITIES OF 
THE TCHEBYCHEFF TYPE 

By Burton II. Camp 
]Ve»kytiii Vnivtrrily 

1. Summary. The Tchebycheff statistical inequality and its generalizations 
are further generalized so as to apply equally well to n -dimensional probability 
distributions. Comparisons may be made with other generalizations [1], [2] 
that have been developed recently for the two-dimensional case. The inequal¬ 
ities given in this paper are generally as close as the most favorable corresponding 
inequalities that exist for the one-dimensional case and in many simple cases 
they are closer than those that have been given heretofore for two dimensions. 
In a special case the upper bound of our inequality is actually attained. The 
theory contains also a less important generalization in one dimension. 

' 2. Introduction. It is necessary to introduce a new kind of moment, to be 
called a “contour" moment, which is a generalization of the usual one-dimensional 
moment, If we consider first a simple two-dimensional frequency surface, 
V - f(k, k), we may think of y as a function of a single variable, x, where x is the 
area of the contour on that surface at the y level. This function may be defined 
so that it is monolomc decreasing and has other simple characteristics. Then 
wc define the rtli contour moment as 



and then the generalization of the Tchebycheff-type inequalities follows easily. 
This theory can be applied equally well to almost any single-valued function of 
n variables which is limited and integrable in the sense of Lebesgue. Therefore 
the theory will be enunciated initially in a very general form. The reasons for 
the initial statements will be indicated only briefly because a detailed discussion 
of quite similar ideas lias been given by this author in another paper [3], where 
he applied the same general principle to obtain generalizations of certain theo¬ 
rems in integration theory. 

3. Preliminary theory. Let f(h ,■••,/„) be a probability distribution with 
limited upper bound L and defined at all points of infinite n-space, which is to be 
denoted by T, dT being the Lebesgue measure of a differential element. We 
thus assume that; 0 g f(l, • ■ • , t n ) ^ L,/has a Lebesgue integral in T, and 

[ / dT = 1. 

J T 

Let Q\ denote the set of points in T where/ > A, (0 ^ A i L), and let %\ be the 

568 
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measure of Q \, for Q\ is known to be measurable Therefore x h = 0, x 0 g <*>, 
and for each X there exists a unique Q\ and therefore a unique x \. This means 
that x is a single-valued function of X and that it exists (or is positive infinite) 
for every value of X in the interval (0 g X ^ L). If X' > X, x\> ^ x\ This 
means that a: is a monotonic decreasing function of X. It need not be continuous; 
that is, it may be asymptotic to the line X = 0, and it may have finite discon¬ 
tinuities or “jumps”. Also there may be an enumerably infinite number of X 
intervals in which x is constant. It follows that X is a monotonic decreasing 
function of x in the interval (0 ^ x ^ .To 5 <»), but it may not exist (in intervals 
where x has jumps), and it may be multiple valued (at points where x is constant) 
We now let y(x) = X*, except that: if X is multiple valued at any point x we 
let y have the minimum value of X at that point. Any other value would do 
equally well because the total measure of such points is zero and they can be left 
out of the integrals that follow. If X does not exist in an x interval, we let y have 
in that interval the value which it has at the beginning of the interval. This is a 
X point where x has a jump. We have thus defined y as a single valued mono- 
tome decreasing function of x m the interval (0 | r g 1:0 ^ °°) and 0 y = L. 
It follows from Lebesguc’s theory that: 

P y(x) dx = f fdT, (0 < X ig L), f ° y(x)dx = [ f dT = 1. 

Jo J Jo Jt 

Finally we restrict our function / so that theie shall be at most a finite number 
of points x where X is multiple valued (intervals of X ovei which x is constant), 
and hence the number of discontinuities of y will be finite. This restriction may 
not be necessary but it is convenient and not embarrassing in applications. 

4. Contour moments. The rlh contour moment is denoted by The con¬ 
tour standard deviation is denoted by v We define 

X y dx. 

0 

It follows that no = 1, and that 

T* 0 

/i2 == === / & y 

Jo 

We shall also let S 2r = fo r /« r We now assume that r is cither zero or a positive 
integer, but in much of what follows this assumption is not necessary. 

Example 1. Let /(fa, U) = (2 *)-V ( * w|) '*. The equation f(k , ,U) = 
defines a circular contour whose area is x = ir(ti + k) - -’hr log 2?rX. Hence 
y = X = (27r)“ 1 e' l/2r , and 

Mr = f x r ydx = (2irYr l , S- 2 = 8ir , a 2r = (2r) , /2 . 

Jo 

6. Contour moments and one-dimensional moments. If n = 1 and if KM) ~ 
f(~k), then 


nXn r (*■/»> n % 

£ 2r = J x lr y dx = 2jf (») 7(0 dt = to r ■ 2 , 
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where #t* r is an ordinary moment. Hence also S- = 2cr> a 2r = p 2 r /S- 2r = n ir 2 ir / 
^■t r = ait . It is to be noticed that, although £ Sr <= a 2r , /u 2r . One 

could alter the definition so that these two moments would be equal by inserting 
into the definition of contour moments the factor 2", using x/2" in place of x, 
but this would introduce a slight complication for a doubtful advantage. Al¬ 
though it would seem to be desirable to define the even contour moments £ 5r 
so that they would become the ordinary moments p 2r in the symmetrical one¬ 
dimensional case, such a definition would not make, the two corresponding odd 
moments equal, and it would not make the two even moments equal in the non- 
symmelrical one-dimensional cruse. So it seems better not to introduce this 
factor 2", but to take note of the relationships that hold in the one-dimensional 
case. 

Theorem. Lei 


Pi = 



where X is such lhal Xk = Sir- Then 


l 


Pi ^ &2r 



2 r +_lY r 

' 2r J 


Corollary 1, In particular 1 — Pi ^ cW<5 2r , 

Corollary 2, If r *= 1, 1 - Pi S 4/9S 1 . This theorem and these two 
corollaries are minor generalizations even of the corresponding one-dimensional 
inequalities, for it is no longer assumed that the probability distribution f(l ) 
has but one mode. 

Proof of Theorem:. Let g(x ) = y(x) if 0 ^ x ^ a* =S 00 > let g(x) = y(—x) 
if — 00 ^ —Xq ^ x 2a 0, and let g(x) = 0 elsewhere in ( — «>, °°). Then g(x) 
has all the properties explicitly required of f(x) in a former paper by this author 
[4] in which this theorem was proved for the one-dimensional case. That is: 
g(x) is a frequency function whose mean is zero, and 

/ gix) dx = 2, and / g(x) dx 

J— oo J iff 

is the probability that | x | > da] g(x) is a monotonic decieasing function of 
| x | for all values of x\ and is symmetrical with respect to the central ordinate. 
Therefore, transforming the symbols of that paper to our present notation, we 
have 

L & a ^ t- 1 ) , 


where 
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Similarly n; r = Mir, aar = a 2r , and finally 



This proves the theorem except that there is one exceptional case that requires 
attention. In the proof of the theorem in the paper just referred to the author 
assumed that the function corresponding to our present g{x) was continuous. 
At that time a “frequency” function was often thought of as determined by a 
smooth curve approximating a histogram and implied even the existence of 
derivatives, and so continuity was not added to the explicit requirement that 
the function be a "frequency” function, but this condition was explicitly intro¬ 
duced in the lemma on which the proof of the theorem was based, and so we do 
now have to consider separately the case where y, and hence g, may have a finite 
number of jumps. It is quite easy to handle this case as the limiting form of a 
continuous case. In that lemma it was also required that ifQ/dt 1 should exist 
and be non-negative, which would imply that we now have to make the require¬ 
ment that y (corresponding to dQ/dt) shall have a non-negative first derivative. 
On examination of the proof, however, it will be observed that this is not neces¬ 
sary, since y is monotonic decreasing and continuous. That is, in the lemma the 
only use made of the condition, d/Q/di* g 0, was that the function Q(t) should 
determine a curve which would be never concave down. But for this it is 
sufficient that dQ/dt be continuous and monotonic increasing, and these condi¬ 
tions are now satisfied by the function which plays the r61e of Q in the present 
discussion. This function will now be defined as 


r 


y{x) dx. 


Let y(x) be a continuous function defined as equal to g[x) except in the neighbor¬ 
hood of the points of finite discontinuity. Near such points it is to be so de¬ 
fined that it shall have all the properties just required of g(x), and in addition 
so that, for any prescribed R > 1 and « > 0, 

jjf x* r y(x) dx = jf x lr g(x) dx + t ] r , (1 £ R ), 




dx -f- tj, 


where | V) r , f ) < e. It is obvious that such a definition of 7 may be made in 
many ways, and one of them is by making use of a linear function m the neigh¬ 
borhood of each point of discontinuity. Since y(x) now satisfies all the condi¬ 
tions of the author’s earlier paper the corresponding inequality is true: 
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where 


Hence 




f x\ dx. 
Jo 


(£ a dX “ v ) ( 5 ' 2r 2r l J (ff * ~ VlY - ^ ~ *• 


Let e approach zero and we have, an desired: 


i - ft £ a,, / («. 2r ^-)". 


Example 2. Let 
y(ii, • , <J = .1 exp 


1 


2 Vcrf 


+ 


£ 2 

+ 2 ) '( i 

(Tn/J 


A = (2x)~ n/z (cri • ■ ,„r. 


This is a form into which the general correlation solid may be put by means of a 
linear transformation. Since P t is a ratio between two parts of such a solid and 
since this ratio is preserved under a linear transformation, the more general case 
may be transformed into this one, or even, as will appear shortly, into the simpler 
one where all the standard deviations are unity. If / = X the contour is the 
ellipsoid, 


i 

vi 


+ 


, in 0 . X 
H t = 2 log — 

Cn 


1 


The volume of this ellipsoid is 


a = h (-2 log X//l)" /2 , h = V Qtn • • • , Vo - - 7 , 


2 1 


nil 


nV(n/2) ' 


Hence y = Ad 




fb = / a r i/ da: 


= nAh rll 2 n/2(rH, - l r ' nr + n 


/nr + w\ 

v /V-'V’+v • V" w 

/ V ' n . ) [r (n/2)] r+l_ ' 


Putting r = 2 we obtain 


ff 2 = 


x"2"’ t ’ a (o- l • • ■ rj 2 r(3n/2) 

w 2 [r(n/2)r 


and then 


2m -f n 


" _ Mar _ 

“ 2r r(n/2) Lr(3n/2). 


r(»/2) 
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Our inequality becomes - 1 - Pi S J t where 


air 




Vr, or 1, whichever is smaller. 


Typical numerical values of S 2r and of J are given in Tables I and II. 


TABLE I 


Values of a% r 


n 

«?!* 

as 

Oli 

OU 

i 

1 • 3 • ■ • (2r - 1) 

l 

3 

15 

2 

(2r) !/2 r 

l 

6 

90 

3 

3 • 5 ■ 7 ■ ■ • (6r + X)/(3 • 5 • 7) r 

l 

12.26 

566 

4 

(4r + 1)!/ (5 !) r 

i 

25.20 

3604 


TABLE II 


Values of J 


b 

n 

r 

J 

1 

i 

i 

0.444 



2 

1.000 

1 

2 

1 

0 444 



2 

1.000 

2 

1 

1 

0 111 



2 

0.077 



3 

0 093 

3 

1 

1 




2 




3 




4 




5 


3 

2 

1 

„ 0.049 



2 

0.030 



3 

0.049 

3 

3 

1 

0.049 



2 

0.062 



3 

0.308 
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Let us now compare J with the true value of (l - Pi) in one of these cases 
m,, when 5 = 3 and n - 3. The true value is given by 

I -ft = 1 -.■if e- MW ‘dx, 

where now & = 4t \/ i()5(cri<3‘2<T3)/3, h “ 47r(cr i o-2£rj)/3. The integral may be 
evaluated by means of the transformation, t = (x/h) ln and a table of the integral 
of J - 1). We obtain: 1 - JP, * 0.0205. This is the true value 

to be compared with the approximation, J « 0.040. The closeness of this 
approximation is similar to that which may be obtained for the normal law by 
using the corresponding inequalities for one dimension. To illustrate this we 
find from the usual tables that, if for the normal law 1 - Pi = 0.0205,5 = 2.32. 
Hence the corresponding inequality is (for r = 2) : 1 - Pj £ 0.042. 

We shall now show that the upper bound of our inequality is actually attained 
in a special case. Let /(^, • ■ •, Q » 2" n in the region (-1 $ k, ■ • ■, t n ^ 1), 
and let/ = 0 elsewhere. Tor this case we shall have * = 0 when A = 2~ n , and 
x = 2 n when 0 ^ A < 2"\ Therefore y = 2“” if 0 £ x < 2 n , and y = 0 

if 2" £ x. Hence J = 2"/V3, Mo = 1, and the true value of (1 - Pi) is 

1- 3/V3; and when 5 = 2/V3, this true value is 1/3. The appropriate in¬ 
equality is: 1 - Pi 4/9 o] and when 5 “ 2/V3 the right hand side of this 

inequality is also equal to 1/3. These relationships are true for all values of n. 
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BOUNDARIES OF MINIMUM SIZE IN BINOMIAL SAMPLING 

By R. L. Blackett 
University of Liverpool 

1. Introduction. Much attention has recently been concentrated on the prob¬ 
lems arising when sampling a binomial population, since this is thought to form a 
suitable model for certain industrial and biological procedures. A general 
discussion of such procedures as applied in industry has been given by Barnard 
[2] and various particular cases have received detailed treatment by Burman [3] 
Stockman and Armitage [6], and Anscombe [1], Unbiased estimation of the 
population parameter (the “fraction defective”) has been investigated by 
Girshick, Mostcller and Savage [4] and Wolfowitz [7]. A paper by Haldane [5] 
is also relevant. 

For such sampling procedures it is necessary to find the probabilities of accept¬ 
ing or rejecting material with a particular fraction defective; to calculate the 
average sample size, and to form an estimate of the fraction defective when 
sampling terminates. All three characteristics may be expressed m terms of 
quantities N(x, y), defined in section 3, so that once these are known, the funda¬ 
mental properties of the scheme are known. 

Here we present a method for determining the N(x, y) , investigate the condi¬ 
tions under which it is valid; relate the method to the estimation problem; and 
exemplify its application. The schemes to which the method can successfully 
be applied are of a special type (to which the title refers) and include all inspec¬ 
tion procedures with a finite upper limit to the sample size likely to be used in 
practice. Other schemes, when dissected in a manner similar to that used by 
Stockman and Armitage, can doubtless be formulated as an aggregate of the 
special types 

2. Nomenclature. Our nomenclature differs in some respects from that of 
Girshick, Mosteller and Savage, although the same collection of terms is em¬ 
ployed. References to their paper should therefore be followed by a comparison 
of the terminology 

Taking a sample of one from a binomial population consists in observing either 
of two events, whose probabilities are p and 1 — p (p ^ Oorl). The results 
of successive samples of one can be represented by the path of a particle m a two- 
dimensional lattice of points with non-negative integer co-ordinates Tins 
particle starts at the origin 0 and at any point ( x , y) travels to {x + 1 > y) if the 
event whose probability is p has occurred, otherwise to (x, y + 1). Sampling 
terminates when the particle reaches a boundary point, and the set of such 
points is denoted by B. Any point which can be reached during sampling, 
including the boundary points, is accessible, and any path from the origin to a 
point B which can be traversed during sampling is admissible ; all other points 
are inaccessible and all other paths inadmissible. The index of a point is the sum 
of its coordinates 
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It will probably help to note in particular that whereas Girshick, Mosteller and 
Savage used p to correspond to events causing the y co-ordinate to increase we 
use it for x. 

3. Determination of N(x, y). The set B determines the sampling scheme and 
we are concerned with schemes in which all points of index greater than n 
the finite maximum index of points in B, are inaccessible. This condition guaran¬ 
tees that if N(x, y) denotes the number of admissible paths from the origin to 
a point (a;, y) of B 

S Nix, y)v x (l - V) v = 1, 

B 

the summation being over all boundary points. Consequently, to determine 
N{x, y) equate coefficients of p in this identity, the coefficient of p° in the left 
hand side being 1 and all others zero. When all the N{x, y) are known, the 
probability of reaching any subset of B can be calculated and the characteristics 
of the scheme found. 

Sometimes it will be convenient to use 

2 :N(x, vW(1 - qf * 1, 

B 

whereof = 1 — p, but the resulting set of equations cannot be independent of the 
first set since if 

fl a t p‘ a bf(l - py, 

<-o j- o 

then 

The polynomial in either p or q is of degree n; the application of this method 
alone is therefore limited to boundaries containing at most (n + 1) points, other¬ 
wise the number of unknowns exceeds the number of equations for them. 

4. Properties of the boundary. 

Theorem 1. If n is the maximum index of points in B and if any point of 
greater index is inaccessible , then B contains at least n + 1 points. 

There must be at least two boundary points of index n for any such point 
(a*, b n ) must be approached from (a„ — 1, b„) or (o„, b„ — 1); in which case 
either (a n — 1, b„ + 1) or (a„ + 1, b„ — 1) is a boundary point. Let P be any 
one of these points. At least one admissible path exists from 0 to P; suppose 
one such path to consist of the points (oo, bo), (ai, b{) , • • ■ , (a n , b„) where 
a* + bk = k (k ~ 0,1,2, ■ • •, ft). It is clear that one or more boundary points exist 
on the line x = a*, having y > b k , for otherwise the particle could travel indefi¬ 
nitely along this line; similarly one or more exist on y = bk with x > a k , and if 
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there is just one on each they cannot be identical unless k = n since (a*, fa) is 
not then a boundary point. Initially (ao, fa) contributes two boundary points; 
since then either a k+ i = a k and fa+i ^ fa or a k+ i ^ a h and fa +1 = fa it follows 
that each succeeding point up to and including (a„_i, b„- 1 ) contributes at least 
one more; the point ( a n , b n ) is counted as soon as x reaches a„ or y reaches b„ , 
whichever occurs first. Consequently there are at least n + 1 boundary points. 

Reversely, if the boundary contains n + 1 points whose maximum index is 
m, such that any point of greater index is inaccessible, then m < n For suppose 
m > n and apply the preceding result. 

An important class of boundaries therefore comprises those with the minimum 
number of points necessary to attain a given maximum index; they may con¬ 
veniently be termed boundaries of minimum size and for them alone the method 
of equating coefficients yields the number of equations equal to the number of 
unkn owns, the first being otherwise less than the second. 

If there are exactly n + 1 boundary points then (a k , fa), (a 2 , fa), • • , (a„_i , 6 „_ i) 
must each contribute to just one; since a k +i = d k or a k + 1 there is one 
point of B on each of the lines x = 0, x = 1, • ■ • , x = a n and this set of points 
(0, d n )(l, di), • • • , (a„ , b„) can be denoted by U, the upper part of the boundary. 
Clearly d k+ 1 > d k - 1 for otherwise more than one boundary point is required 
on the hne x = lc + 1. Similarly, there must be a second group of points of B 
(c 0 ,0), (ci, 1 ),•••, (a„ , 6„) with c k . H > c k - 1 forming the lower boundary L; 
and all (n -f 1) points have now been enumerated, the point P belonging to both 
V and L. The characteristic of such sets B is that the sequences U and L both 
have monotonically non-decreasing index, the special case of sequences with 
monotonically increasing index provides the rejection and acceptance boundaries 
of non-rectifying industrial inspection procedures (The difference between 
rectifying and non-rectifying procedures is clearly stated in the introduction to 
Anscombe [1]). 

Theorem 2. For boundaries of minimum size any two accessible points not m B 
of the same index m cannot be separated on the line x J r y = mby boundary or in¬ 
accessible points In the terminology of Girshick, Mosteller and Savage the 

accessible points not in B form a simple region. 

Let Q(x i, yi) and R(x 2 , y-f) be any two such accessible points of index m and 
suppose xi < Xi . There are two possibilities: (a m , b m ) does or does not lie be¬ 
tween Q and R 

(i) (a m , b m ) lies between Q and R, i.e x k <a m <x 2 . In this case there must 

be points of B at Q'{x x , TO with Fi > yi and at R'(X 2 , yi) with > x, the 
boundary from Q' to P and from R' to P has non-decreasing index; hence all 
points of U on the lines x = X\ , x = X\ + 1, • • ■ , x — a m — 1 ha\e in ex a 
least xi 4- Y-i > m; similarly all points of L on the lines y*=yj,y = !/j+l>-"> 
y = - 1 have index at least X 2 + y> > rn. By definition of the boundary 

there are no additional points of B on either group of lines between the path or 
and the line x + y = n, so the proof of the theorem is completed. _ 

(ii) If xi > a m or x 2 < a m the proof is precisely analogous to that given m (i). 
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5. Justification, of the method. Theorem 3. For boundaries of minimum size 
the equations for N(x, y) are soluble and of rank n + 1. 

To prove this we give a general method of solution for the system of equations, 
using powers of p and q alternately; as already remarked, this is equivalent to 
using the equations from the coefficients of powers of p only. In the first place, 
note that the coefficient of q u is a linear combination of numbers N{x,y) with 
x + y >u and y < u \and the coefficient of p l has x + y > t and x < t, 

Lot s = Min(d 0 , di, da, • • ■, K) — 1* 

Then from the coefficients of q°, q, • • • , q‘ can successively be determined 
N(oo, o) N( Cl , 1), • • • , N(c,, s), the matrix of the equations being triangular with 
ones in the main diagonal. The points in U at (h, s + 1), (r 2 , s + 1), • ■ ■ now 
appear in the coefficients of ?' +1 , q + \ ■ • • and complicate the solution 

Letr = Max(n , r 2 , • • •)• 

If either (r, d r ) or (c,, s) is the point P then all the remaining N(x, y) can 
successively be determined from the coefficients of powers of p when the values 
of N(co, 0), 1V(ci , 1), • " i i s ) are substituted in the equations. Othervfise 
the path OP for y > s + 1 must have x > r + 1 so that all points of L on y > 
s + 1 have x > r + 2 i.e, any point of L on x = 0, x = 1, • ■ • , x = r has 
y < &) for such points the number of admissible paths is now known. Therefore 
from the coefficients of p°, p\ ■ ■ ■ , p can successively be determined iV(0, do), 
N(l, d^, ■ • • , iV(r, d r ), the matrix of these unknowns being again triangular; 
in particular N(n , s + 1), N{n , s + 1), • • ■ can now be found. 

Let s L = Min (d m , <i r+J , • • , K) - 1, so that Si > «. The coefficients 
of q‘ u , 2' +! ) ' 1 - . q l give successively N(c.+ 1 , s + 1) N(c,+i, s + 2), ■ • • , 

N(c,i , «i); for the points in U at (r», -f 1), (ri 2 , s x + 1) • • • . Let 

n = Max (ni, r«, • • •)■ 

Since there is only one point of U on each line x = constant, n > r. As 
before, if either (ft, d n ) or (c Sl , a,) is P the remaining points of U are soon deter¬ 
mined Otherwise the process continues and there result an increasing sequence 
of points of L and a similar sequence for U; the process terminates when 
(a„ , bf) has been reached m both, when all N(x, y) will have been found 
It is clear that for particular cases alternative methods of solution will prove 
more convenient, 

6 . Connection with estimation. Supposo that the point (f, u) is accessible and 
let N*(x, y ) be the number of admissible paths from (t, u) to ( x , y) where (a, y) 
is in B. Then Girshick, Mosteller and Savage have shown that N*[x, y)JN{x , y) 
is an unbiased estimate of p(\ - pf) and a necessary and sufficient condition 
for it to be the unique unbiased estimate is that the accessible points no m 
form a simple finite region Hence from theorem 2 such estimates are unique 
for schemes with boundaries of minimum size, An alternative proof is given y 
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considering that if two unbiased estimates of any function of p exist and /(-r, y) 
is the difference between them at (m, y) 

£/(*> y)N(x, y)p\l - p) v = 0, 

U 

where f(x, y) is not everywhere zero. The equations formed by equating coeffi¬ 
cients have rank (n + 1) as shown by Theoiem 3, so that the only solution is 
f(x, y)N(x,y ) =0 Since each N(x,y ) is certainly positive it follows at once that 
f(x, y) = 0 and there can only be one unbiased estimate. 


7. An illustration. As an application of the method we take the interesting 
rectifying sequential inspection scheme discussed by Anscombe. The boundary 
points are at (if, 0), (if + 6,1), • • • (if + y.b, p), where m is the greatest integer 
less than ( N — H)/(b + 1), and thereafter on the line x + y = N. The equa¬ 
tions for N (x, y) take here their simplest form, namely equation (4) of Barnard’s 
paper. From the coefficients of 2°, q 1 , • • • , q v , • • ■ , 

1 = N(H, 0); 


o = N(H + b, 1) - 0) whence N(H + b, 1) = if; 

0 = N(H + '2b, 2) ~ ( Jf * b ) # + (f) whence N& + 2b, 2) 


H{H + 26+1 


2 ! 


- ot+3M) - ( H v 2b ) m + 2, — } - CD * + 0 

H{H + 3b + 2)(H + 36 +1) 


whence N(H + 36, 3) = 


3! 


It now appears reasonable to guess the general term as 

— (H + yb+y- 1 )(ff + yb + y - 2) ■ • • {H + yb + 1). 

y [ 

The proof is therefore complete if we show 

(H\ (H A- b\ rr | /if+ 26\ ff(ff + 26 + 1) 

+ 21 

(E + 36\ H(H + 36 + 2 )(H + 36 + 1) 

“W-3/ 31 

H(H + yb + y - D (H + yb + y - 2) • • • (g + yb + 1) = 0 
+ ■• + (- 1 ) —-- 
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Put (i) + 1) = £, and Lhc left hand side becomes 
(II “ 1)! (// + £ — 1)! , (// + 2£-l)' 

(// - 2/)!y! (// + $ ~ m (U + 22)!21 

_ ... " 1)1 
(// + J/| - y) \y\' 

which is y times the coefficient of t u ~ }> m (1 + t) l{lvH x [(1 +i)~ f - f'f, 
Rewriting the latter as (1 + f) w-1 ll - (1 + r 1 )*]", it becomes clear that the 
highest power of l is whence the required result follows. 
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NON-PARAMETRIC TOLERANCE LIMITS 1 


By R, B. Murphy 
Princeton University 


1. Summary. In. this note are presented graphs of minimum probable popu¬ 
lation coverage by sample blocks determined by the order statistics of a sample 
from a population with a continuous but unknown cumulative distribution func¬ 
tion (e.d.f.). The graphs are constructed for the three tolerance levels 90, 
95, and .99. The number, m, of blocks excluded from the tolerance region runs 
as follows: m = 1(1)6(2)10(5)30(10)60(20)100, and the sample size, n, runs from 
m to 500. 

Thus the curves show the solution, /3, of the equation 1 - a = 
J^ n - m + 1, m) for a = .90, .95, .99 over the range of n and m given above, 
where h('p, q) is Pearson’s notation for the incomplete beta function 
Examples are cited below for the one- and two-variate cases. Finally, the 
exact and approximate formulae used in computations for these graphs are given 


2. Introduction. Suppose a sample of size n is drawn from a population hav¬ 
ing a continuous cumulative distribution function (c d f), F{x ). Let the sample 
values arranged in order of increasing magnitude be • The frac¬ 

tion, u, of the population which is included between *, (the r-th smallest value 
m the sample) and x n _ s+1 (the s-th largest value) is Ffrn-.+i) - F{x ,) ibis 
quantity u has been called the population coverage for the interval (x T , x n -, n ) 
The probability element for this coverage is 


. r(n + 1) 

(2.1) /(«) du - _ m + l)T(ffl) 


-r U 


"(1 - u) 


kW—l 


where m = r + s. From (2.1) we can calculate the probability that this coverage 
is at least a given amount, say 0. If we call this probability «, we have 


a = J f(u ) du. 


( 2 . 2 ) 

The quantity « is the probability that 1000% of the J”" 

between x, and , and it is called the Usram, to* Tins proh.b.hty de 

pends only on n and m (=r + s )' 

i All computations involved in this P»P« were carried out under an Office of Naval Re¬ 
search contract. 


581 



1582 


It. B MURPHY 


The idea of coverage is more general than it first appears. If we think of 
x,, i 2 , ■ ■ • , I, as points plotted along the x-axis, we will then have n + l 
intervals; (- t xi), ,(**,+ 00 )i which, following Tukey [3], we will 
call blocks, The reason for this term will be dear when we deal with the case of a 
sample from a population of more than one variable. The coverage for the i-tli 
block (x, , Xin) is F(x iH ) - Fin). The probability element of the sum of the 
coverages of any p reassigned group of n - m + 1 blocks is given by (2.1) and 
hence the probability a that the fraction of the population covered by any 
n — m 4-1 blocks is given by (2.2), By preassigned blocks wo mean ones desig¬ 
nated by order statistics prior to obtaining any sample from which a prediction is 
to be made with these blocks In general it is not legitimate, after taking a sample 
and for some reason evident only then, to specify which blocks in this sample are 
to be included or excluded from the coverage. There is no objection, however, 
to specifying a scheme of blocks for the coverage on the basis of past samples 
when the scheme is to be applied to future samples. 

The purpose of this note is to present graphs of 0 as a function of n for m = 
1(1)0(2)10(5)30(10)00(20)100 and for « = .90, .95, .99 There are three figures: 
Figure 1 gives curves for a = ,90, Figure 2 for a = 95, and Figure 3 for a = 
.99, The graphs are accurate to at least two decimal places but never more than 
three. In terms of tho Pearson notation (2.2) gives, after minor alternation, 

1 - a = If (n — m + 1, m). Hence these graphs may also be used to find 
tho 10, 5 and 1 per cent points of a variate X (0 < X < 1 ) with the c.d.f. I*(p, q ) 
for 1 < p < 500 and 1 < q < 100, 

3 . Computations for the graphs. If in tho relation (2.2) three of the argu¬ 
ments a, 0, m, and n are given, the solution for the fourth may often be found 
in Pearson [6] or Thompson [G], The values of 0 through n = 100 were com¬ 
puted exactly for these graphs. For larger n, 0 was computed approximately 
from 

( 3 i) jjv ("V (x\ - 2w) 2 + lGnjn - to) - ( xl - 2m) 2 

' p ~~ L 4n 

where xl is determined by the relation 

P/'(x Z £ X«) = 1 - a 

and has 2m degrees of freedom. This approximation is due to Scheffd and Tukey. 
For large m the Cornish-Fishor approximation to x« was used. 

4. Illustrations of the one-variate case. The most common use to which 
the graphs presented here may be put is in the prediction of 0 in sampling from 
a distribution of a single random variable. It is this case that was first presented 
by Wilks [1], Suppose in the mass production of a certain type of screw one is 
interested in the least proportion of all screws manufactured that have lengths 
between the least and greatest lengths appearing m a random sample of 100 
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screws. 11 is assumed that we do not know the distribution of the, length, X 
of a screw produced in this process. Furthermore, it is assumed, of course, that 
the. manufacturing process is in a state of statistical control in the sense of 
Khcwhart. We plan fo discard two blocks: (- «, j-,) and (a,- JO o, + «>)—exactly 
as many blocks as observations, At the level a — .99 we obtain from Figure 3 
that at least 93.5% of all screws in the population sampled have lengths that fall 
between and xm ■ If we now draw a random sample of 100 screws and find 
the least and greatest screw lengths to be 1.40 and 1.00 inches respectively, wo 
may say that at least 93,5% of all screws from the, population sampled have 
lengths between 1.40 and 1.00 inches at the .99 tolerance level. It must, be 
observed that the prediction is made on the basis of preassigned order statistics, 
and not of the values 1.40 and 1.00. 

We might equally as well have put the question in another way: If wo want 
at least 93.5% of the lengths of all screws to lie within the range of lengths of a 
sample of 100 screws, then at the tolerance level a = .99 what is the smallest 
sample wc could have in which as many os 2% of the sample are not acceptable? 
Examining the intersections of the curves in Figure 3 with the line /3 = .935 we. 
choose the smallest n such that m/n g .02 and find n - 100. 

5. The case of more than one variate. The ideas given in the introduction 
may be extended to sampling situations involving two or more statistically 
dependent variates with a continuous joint c.d.f. by meanB of the notion of blocks. 
The abstract formulation is given by Tukey [3]. We shall restrict ourselves to 
the case of two dependent variates X and Y, but the generalization is obvious, 
Because of the dependence, the joint population of X and Y may bo expressed 
as an associated pair of values W = (X, Y) . Suppose a sample of size n is drawn 
from this population, and lot the pairs be wi , Wi, • ■ • , io n , where Wi - (a u , yi). 
If we now choose a sequence of n numerically valued functions of x and y (or of w), 
fi(w), ■ • • j f n (w )i let us order the Wt in a sequence u>j 1) , to! 0 , • • • , wi 11 such that 
/i(uii+i) > ji(«4 l) ). Imagine now that the sample values are plotted in a plane 
scatter diagram. We call the first block the set of points w — (x, y) such that 
fi(w) < That is, we may imagine the curve fi(x, y ) — fi(w{ l] ) = 0 

plotted in the plane and that the first block is bounded by this curve. Then 
discarding wj 11 we take the n — 1 remaining wt and order them in a sequence 
u){ 2) , Wt \ ■ * * , Wnh such that/ 2 (u)j+i) > /j(w>< s) ). We call the second block the 
set of points w = (x, y ) such that fi{w) > fi(wi h ) and also ft(w) < fi(wi 2> ). 
Thus the second block is bounded by the curves /i(x, y) — = 0 and 

fi(x, y) — fc(wi 2} ) - 0. If wo continue this process of discarding and reordering, 
until all n functions /,• are used, wo shall obtain a division of tho plane into 
n + 1 non-overlapping blocks, the "extra” block arising at the last step in the 
process. Then the fraction, u, of "points” (X, Y ) of the joint population of 
X and Y that are covered by any n — m + 1 blocks lias the probability element 
(2.1). Also the probability a that the population coverage, u, well be at least as 
large as fi is given by (2 2) The n — m + 1 blocks constitute a tolerance region. 
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An extension of this case has been made by Wald [2]. Namely, before a 
sample is taken let us choose a numerically valued function f of w and choose 
h{£n) of the and order them in a sequence <>, <>, • , suc h that 

J > f(w aj ) and a J+ i > aj. Next, within each "strip” of the (x, y) plane 
such that w - (z, y) satisfies /«’ hl ) > /(to) > suppose that we follow 

the construction in the previous paragraph. Then the population coverage, n, 
by n - m + 1 blocks from one or more of these strips or their exteriors has the 
probability element (2.1). 

Again the warning must be made that the above functions /, fi , h ,••*,/„, 
the numbers (ii, Os, ■ • , a* and the sequence of construction must be completely 
specified before samples are drawn to which this scheme is to be applied. 

0. Illustrations for two variates. As an example of the use of the graphs for a 
two-variate case, we use an example cited by Tippett [8]. The two variates are 
the percentage of pig iron, X , and the lime consumption, Y, per cwt, of steel m 
100 steel castings made without slag control. A scatter diagram is given in 
Figure 4. Unfortunately the value of this example is lessened by the fact that 
the block schemes were made after the sample had been taken; it does illustrate, 
at least, the two simple types of scheme. 

The tolerance region T (solid lines in Figure 4) resulted from the following 
scheme: let/i(ui) = y, / 2 (w) = f 3 (w) - f t (w) = fs(w) = f 6 (w) = -y. Now 
follow the Wald procedure choosing/(w) = y with h = 6, and a 3 = 1, a 2 = 13, 
a 3 = 40, ai = 75, a 6 = 90, a 0 = 96. Then in each strip y 0 , +1 > y > y aj let 
]\(w) = x. Considering only the blocks within the heavy line as the tolerance 
region, wo have, by counting the discarded blocks, m = 16. 

In constructing the region T' (broken lines in Figure 4) we also use Wald’s 
method, taking f(w) = y — 5i with h = 2 and 01 = 3, o 2 = 96 In the exterior 
region with f(w) > f(wn) let all ft = y + 5x and similarly in the exterior region 
f(w) < f{io a W) ). Then in the strip > f(w) > f(w a®) (be., in the region in 
which 41 > y — 5z > —77) choose fi(w) = y,fz(w) = fz[w) = /<(«) = —y, 
Jt(w) = ft(w) = Mw) = y + 5z, and J s {w) = f,(w) = -y - 5x. Counting 
the blocks outside the heavily bordered region, we have m = 17. 

We obtain by interpolation /3 = .80 for T and ,8 = .78 for T' at the a = .90 
level 

7. Ties. A tie is a sample point which in a coordinate system defining a set 
of order statistics coincides in one or more coordinates with other sample points. 
For instance, in the X coordinate of our example (32,159) and (32,185) are tied, 
and (47, 218) and (47< 218) are tied in any system of coordinates. It would 
seem easier to avoid ties with regions of the type of T 7 than with those of the 
type of T. 

The existence of ties in the population is assumed impossible, because positive 
point probabilities would destroy the continuity of the c.d.f Therefore we 
attribute the ties to the crudity of measuring devices. 
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A procedure for handling tic- is given by Tukey [ ij. 
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THE FOURTH DEGREE EXPONENTIAL DISTRIBUTION 
FUNCTION 1 

By Lko A, Aroian 
Hunter College 

We shall derive it recursion formula for the moments of the fourth degree 
exponential distribution nmelion, state its more characteristic features, and show 
liow Lilt 1 graduation of observed distributions may be accomplished by the method 
of moments ami thr* method of maximum likelihood. The purpose of the note 
is to moke possible a wirier turn of this function. 

It. A. Fisher (If introduced the fourth degree exponential function 

(1) lit - ■ k exp | -GW + W + + ftf)L 

where r t < l *. s», / •> (x — rn )jo, m indicates the population mean, <r the 
population standard deviation, and where the /S's are functions of 

Cyt dt. 

A, L. O’Toole in two stimulating papers [2], [3], has studied (1), however his 
methods and results are unnecessarily comphcated. O’Toole requires eight 
moments to determine parameters similar to the d’s. Both Fisher and O’Toole 
considered the restricted class of (1) with range (— ») 

Let 

(2) u , u t n exp [ - (did + W + W*)}, dv = e~ ht dt 
iu 

p'i 

(3) « / Cyt dt, obtaining 

J n 

(4) 4dr«», 3 + Sdaffn w + 2/9ian+i + di<*» - nat„-i, n = 1, 2, 3, • • , 

1 Presented to the American Mathematical Sooiety and the Institute of Mathematical 
Statistics, September 4, 1947. 
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and for n ~ 0, the rij^ht wide of (4) is defined as zero. The result (4) is valid under 
the assumption 

(5) win 0. 

(liven file first six moments, A, A, A , A are readily determined. It will ho 
found that if A > 0, A 0, then n ^ - x ,, r 2 -*= « ; while if A < 0, and ft ^ o, 

fi and r* will he finite. If we set n =*• f), 1, 2, 2, in (4), the solutions are 

A « 1 aj(«i — 4«a) - («< -- d)(«i — 1)1 4/7; 

A “ I — 3« t •- «a) + (exs — «.i)(«4 ~ 3)j v 37); 

( 0 ) 

A ® {(a,i - aj)(a 3 - 4a a ) f («< — I)(aj - — 3«d) 20; 

A =* |aa(a<i — asas ~ 3an + 3a.i) — (« 4 — 3)(«j — orjau)) + O, 


D = («o — al ~ A)(rti 


1 ) — (as — «j •“ ajai) 2 § 0. 


To prove 0 1 0 wo adopt the method of J. JO. Wilkins Jr. [4]. In only a trivial 
cose is 1) - 0. Let 

G(ct, b, c, d) =» f (n -(- hi "|- d" -|- dl’)‘}/i <U *5 0, 

■where y, is any probability function with range n £ t Sis. .Since G(a, b, c, d) 
is a semi-delinite quadratic form, its discriminant will ho non-negative. But 
its discriminant is easily seen to he equal to I), thus 
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Wc summarize without proofs the essential features of the fourth degree 
exponential. Near the normal point, 04 = 3, a 3 = 0 , the fourth degree expo¬ 
nential function, the Pearson system, and the Gram-Chariicr Type A arc essen¬ 
tially alike. Type C [5] while similar is not the same. Note that A may be 
negative and in such a case r t and r 2 are the two real zeros of the derivative of ( 1 ). 
The exponential may bo bimodal as well as unimodal and tho normal curve is 
the special case A * A = A = 0. Various special cases where a particular /S 
is zoro are readily handled by either (4) or (0). The graduation of both unimodal 
and bimodal observed distributions will be published elsewhere. 

Let 

f 

(8) yt = k exp - 23 A if, n < t < r !( 

1 
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where 


i n r 

(9) r. = / exp - £ ft d dt. 

Iif J r l 7-1 

The likelihood, L, in a sample of N is given by 

(10) L k” exp j-jft g t: + dr -1 £ C 1 + ■ • • + ft g fjj 

where it -■ (x, ~ wt)/<r. Then 
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If we assume either ri and r a constant, or exp 
negligible, then (12) becomes 
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L 9ft 


implies 



whore a,- is the sample estimate of ay. For, if in £f,yiV we let j = 1, 2, we find 
by (13) that x = m, and <r s = £(&,• - xf/N The solution of (13) provides esti¬ 
mates of ft , ft , ft , and ft , if we set r = 4. Naturally more time is required 
for the solution of (13) as compared with the method of moments, but the maxi¬ 
mum likelihood estimates are asymptotically efficient. The system (13) must 
be solved by successive approximations. To determine the moments solution 
all wo do is to replace ay by fly in equations (6). This affords a point of departure 
from which the maximum likelihood equations may be solved. The two methods 
are not the same. 

The fourth degree exponential is readily generalized to a fourth (or fth) degree 
multivariate function including the normal multivariate function as a spe¬ 
cial case. 
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AN APPROXIMATION TO THE BINOMIAL SUMMATION 

By (5. t*\ Ciumeu 
]Vanhin(ilon, 1>. (', 

Wc consider the binomial expansion (q •!- />)”, where q — I — p and n is a 
positive integer. For given values of n, p, r, and ft, where, np < r < s < n, 
wo. are often interested in the probability P(r < x < n) that the number of suc¬ 
cesses x will satisfy r < ,r < ft. 

When n does not exceed 50, we can use tables of the Incomplete. Beta Function, 
or other convenient and accurate tallies. For "large” values of n, we can use 
normal tables. When p is “small”, we, can use Poisson tables. However, it is 
often true that p is fairly small, and yet not small enough to give, really accurate 
results when Poisson tables are employed in tlm usual way, while n is too large 
for use of the tables of the Incomplete Beta Function ami yet too small for ac¬ 
curate use of normal tallies. 

It frequently happens that an upper bound for P(r < x < s) would serve our 
purpose. Wc propose to show how to find this from Poisson tables with greater 
accuracy than could be obtained by using these tables in the ordinary way. 

Wc shall denote the general term of the binomial expansion by Bt ~ (i)p'q n ~ { 
and the general term of the corresponding Poisson distribution with the same 
value of p by Pi = (pn)'e' ! '"/i\ Wc shall also consider a second Poisson dis¬ 
tribution whoso general term is given by P\ = ( p'n)'f p ' n /P , where p' 4 = p 
will ho determined later. 

We shall use the following notations: 

0) Ui = B, +l /B, = (n - t)(p)/(t + I)(l — p); 

(2) V, » I\ H /P< = pn /(i + 1); 

(3) V[ = P'm/P't = p'n/d + l); 

(4) Ui - Vi ~ p(np - t)/(t -{- L)(L — p). 

From (4) wo obtain at once the following: 

Lemma I. 17, > V t or U, < F, according asi < np or i > np. 

Thus, the size of the general term of the binomial expansion falls off more 
steeply to the right of i = np than does that of the general Poisson term. 
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Wo can use lemma I lo obtain an upper bound to P(r <x< s) for any r > nv 
In fact, 

IK = IKIK/Pr ; 

Urn < IKPru/Pr) 

Prfi < IK\lP r u/Pr+l < B r P T+i /P T ; 


It. < B,I\/P r . 

Adding these, we obtain 

(5) P (r < * < s) = £ B x < ( B r /P r ) £ P. = (B r /P r ) (£p,-£p\ 

t “ r l “ r \i“r t“fi / 

The quantity in parentheses in (5) can be found by use of the cumulative Pois¬ 
son table provided, of course, it is within the range of that table, while the 
B T IP r can be computed directly. 

In the work we have done so far, we have used a Poisson distribution which 
is less sleep than the corresponding binomial distribution throughout the whole 
interval np < r < x < n. It seems reasonable to investigate the possibihty of 
improving upon (5) by using a Poisson distribution having a different value p' 
in place of p, where p' is chosen so that the new Poisson distribution is of the 
same steepness at x = r as is the binomial distribution. We wish to have 
U t = V' r and lh < 7< for all r < i < n. The first of these conditions requires 
that (n - r)(p)/(r + 1)(1 - p) = p'n/(r + 1). Solving for p' we obtain 

(G) p' « (n - r)(p)/(n)(l - p). 

We are now ready to prove the following: / 

Lemma II. If p' is defined by (G) and if U, , 7, , and 7, are defined by (1), (2), 
and (3) respectively, then U { < 7 ■ < 7 ( , provided r > np and i > r. 

It is easy to see that Ui/V[ = (n — *)(p)(l + 1 )/{1 + i)( 1 — p)(np'), and 
this can be reduced to {n — i)/(n — r) by replacing p'by its value from (6). 
Then lh/V'i < 1 since i > r. Moreover, we have 7,/7, = {p r n)(i + 1)/ 
(i + 1 )(pn) = p'/p = (n - r)/(n - np). But r > np and hence 7, < 7,. 

This completes the proof of Lemma II. 

Wo are now' in a position to obtain an inequality somewhat better than (5). 
The derivation of the new upper bound for P(r < x < s) goes just as before 
except that each P< is replaced by Pi. We obtain the new inequality 

(7) P(r < x < s) < K'Br/P'r, 

where K' = Y^Pi — Y)P<’ 

i«*r 

We can get a lower bound as well as a 


somewhat improved upper bound for 



Q, F. 01UMKit 


594 

P(r <%<s) by calculating B f and iUi directly and then applying (5) or (7) 
to find an upper bound M of P(r +1 < x < s), Tins gives the, inequality 

(8) Br + Ah < P(r <x<t)<B r i if. 

This could, of course, be still further improved by calculating directly still more 
of the P.'s and using a similar procedure, but one would not care to carry this 
very far. 

To illustrate the various approximations, wo have worked out a numerical 
example the results of which appear below, For convenience in checking, we 
have used a value of n which is within the range of the tables of the Incomplete 
Beta Function, even though we would ordinarily use our method only for larger 
values of n. 

Example, s ■ n « 40; r * 10j p » 1/10; p ' ~ 1/12, The tables of the 
Incomplete Beta Function give P(10 < x < 40) - .0050031, Using Poisson 
tables in the usual way, wc get P(10,4) - P(40,4) = ,008132, which is not 
particularly good, Using inequality (5) we obtain; Bu/Pn - -0790 and 
P(10 < x < 40) < ,670O(.008132) - ,005522. Using (8) and calculating both 
Bn and Bn, we take r * 11 in the inequality (5) and obtain Bn - ,0035934, 
B n * .0010889, P(ll, 4) - P(40,4) - ,002840, BtJPn « .5057, and hence 
.004082 < P(10 < * < 40) < .003594 + ,001607 - ,00520, Again using 
method (8), but calculating Bn also and using r * 12 in inequality (5), wo get 
,004974 < P(10 < x < 40) < ,005099, which is quite good. We can obtain a 
still better result by using inequality (7) instead of (5). Then p' * 1/12, 
np 1 * 10/3, Bn/Plo * 2.150 + , P(10,10 % - F(10,10/3) * ,002300, and 
P(10 < t < 40) < .005087, 
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1. On Distribution-free Confidence Intervals (Preliminary Report). Wassily 
Hoepfding, University of North Carolina, Chapel Hill. 

Let 0(F) lie a functional of a distribution function (d.f) Fix) (where a; is a real number 
or a vector), defined over a class 3) of d.f.’e; 0„ a random sample from a population with 
d f. F(x ); g„ < &„ two functions of 0 „; and a„ = Pr [S„ < 0{F) < 0 „) Conditions are studied 
under which, given «, 0 < a < 1, we have cither a„ = a or a„ > o or «„ -> a , for all F(x) 
in ii), where 3) is defined independently of the functional form of F(x ). Under fairly gen¬ 
eral conditions wo can obtain by “studentization” confidence limitB 6,„ fl„ such that bm 

n„ = «, and 7 “ lim By n(0„ - S„) exists ;7 is minimized by using aleast variance estimate 

n —• oo 

of 9(f ), If there exists a function k(0) such that var T n < if 0(F) = 0, for all F 

in £ J\ w 0 can delino confidence limits with a positive lower bound for £*„. This applies to a 
number of population characteristics estimated by rank order statistics, such as the co¬ 
efficients p' and r (estimated by Spearman’s and Lmdeberg-ICendall’s rank correlation 
coefficients, respectively). In certain casos (including p 1 and t), 0(F) admits a bmomially 
distiibuted estimate; then exact confidence limits can easily be obtained This research 
was done under an Office of Naval Research contract 


2. On Certain Statistics for Samples of 3 from a Normal Population. Julius 
Li BULKIN, National Bureau of Standards, Washington. 

In analytical chemistry three determinations are frequently made Sometimes the 
average of only the two cloned results is reported, the remaining observation being rejected 
as anomalous. In preparing a ciitique of thiB procedure, Dr. W J. Youden encounteied 
a need for information on certain properties of the distributions of the statistics 
(x‘ - x")/(Xi - art), (x' -h x")/2, and (x‘ - x")/2, whore x' and x" ( x' > x") are the two 
closest of the three determinations. This paper shows how theBe statistics differ from the 
ones heretofore treated involving ‘‘fixed” order statistics, gives the distribution of these 
statistics in random samplos of 3 from a normal universe, and lists values of certain of the 
moments of their distributions 

3. On Multinomial Distributions with Limited Freedom: A Stochastic Genesis 
of Pareto’s and Pearson’s Curves. Maria Castbllain, University of 
Kansas City. 

Tho purpose of this paper is to investigate the most probable configuration of N random 
elements to bo distributed in K(K < N ) class intervals, where known forces are acting. 
Wo shall call those intervals of energy, using the terminology of statistical mechanics 
Wo will provo that the most probable configuration is a configuration of statistical equi¬ 
librium since its probability of ooourring oonverges to 1 as N becomes infinitely large 
Tho main purpose of this paper is to discover which forces of attraction, operating m 
the intervals of energy, give Pareto’s and Pearson’s curves when statistical equilibrium 

is reached. , , 

Wo will consider a random variable Y(t), t being an independent variable obeying a 
multinomial distribution law with limited freedom, and we will exploit the familiar process 
of statistical mechanics. The equation of the frequency curves corresponding to the equi¬ 
librium stage of the statistical experiment will be shown 
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4, Fitting Generalized Truncated Normal Distributions. Harold Hotelling, 
University of North Carolina, Chapel Hill. 

Inasamplr from np-dimensional normal distribution only tltosc individuals arc supposed 
to Ik» observed winch fall in a specified Imt arbitrary act A of positive measure. For esti¬ 
mating the parameters (he method of moments is proved equivalent to that of maximum 
likelihood ami therefore efficient. Tim problem m thus reduced to that of expressing the 
parameters of the normal distribution in terms of the momenta of the truncated distribu¬ 
tion. This however is not generally possible in dimple explicit form. Methods are pre¬ 
sented for dealing numerically with several special cases, including those in which A is a 
linear interval or a parallelogram. 


5. On the Distribution of the Two Closest Observations Among a Set of Three 
Independent Observations. G. R. Setit, Iowa State College. 

Lot *i, Si, »t (ti <Xi< Xi \ ho three independent ordered observations from a population 
having a probability density function f(x). Let x\ x" (z 1 < x") be the two closest, then the 
probability density function of x\ x" is given by 

0 • J(x') ■ /(*")[ 1 + F(2x" - x') - F(2x' - x")} 


where 



/(x) dx. 


In the case /(x) is a normal distribution with unit variance, tho joint distribution 
z ff — x f 

of y «» x n — x' and z *» - • - ia obtained tm 

Xi - Xi 


■Kl* 



?/*(! — z 4- ?’) 

37 " 


This problem is of intorcst in oases whero tho conclusions arc to bo based on a set of 
three observations and one of tho observations is to bo rcjoctccl in tho analysis of tho data 


6. The Derivation of Certain Recurrence Formulae and their Application to the 
Extension of Existing Published Incomplete Beta Function Tables. T. A. 

Bancroft, Alabama Polytechnic Institute, Auburn (presented by title). 

The objects of the paper aro: (1) to givo a number of new recurrence formulae in the in¬ 
complete betafunction derived by a new method, and (2) to indicate how these new formulae 
have been used to obtain now tables of the incomplete bota function that are outside tho 
range of tho p and q values given in tho existing published tables, 

The recurrence formulao have been dcrivod by considering the incomplete bota function 
as a special case of the liyporgooraotrio series, thus 

•B*(p, q) = F(p, 1 - q, p -(- 1, x), 

V 

where tho usual form of the hyporgoometric series is 


F(a, b, c, x) 


od *_ a(a + 1) ■ 6(i + l) 

1 + o II + " c(c 4-1) 21 

q(a 4- l)(a + 2) • b(b + l)(6 + 2) a 3 _ 
+ c(e + lj(c + 2) 31 + 
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Thu aeries converges for|x| < l,andx« 1, if and onlyifa-f-4 < c. Certain recurrence 
formulae for F(a, b, c, x) arc- then directly converted far use with&fa, q), or in the so-called 
normalized form I*(p, q), provided c « a + 1. All conditions have been satisfied by setting 
a » p, b » 1 — q, c “ p + 1, and q > 0. 

For example, ubiur (ho above mentioned methods we may obtain, among many others the 
recurrence! formulae ’ 

fil xI,(V . 9) “ L(P + 1, 9! + (1 - x)T x (p +1, q - 1) » 0, 

fu) (p + 9“ px)IAp,q) - ?/*(?, ? + l) ~p(l - *)J,(p +1,9-1) = 0, 

Ilii) ?L(Pi 9 d-1) + pL(p + 1,9) - (p + q)h(.p , 9) = 0. 

Formula (i) is essentially the basic recurrence formula used to obtain Karl Pearson’s 
(aides An indication of formula (iii) in another form was given by the author in the paper 
" On Biases in Estimation Due to the Use of Preliminary Tests of Significance,” Annals of 
•1 talk, Ptat., Veil. 15 (1014), p. 104, and a direct proof was later given by the author in “Note 
on an Identity in the Incomplete Beta Function,” Annals of Math Slat. Vol. 16 (1945), pp. 
08-99. All of the material in the present paper, however, is new, including recurrence form¬ 
ulae and tables and the mathematical method of derivation. 


7. Asymptotic Studentization in Testing of Hypotheses. Herman Chernofp, 
Cowles Commission for Research in Economics. 

If // is n hypothesis for which l < c.i(O) would be a good test if the value of the nuisance 
parameter 0 were, known and & is an estimate of 0, then tho following method of asymptotic 
studentization (obtaining critical legions of almost constant size) was suggested by Wald 
Consider l < where ip(a) «« ci(6) -f- • + c.(ff) and Pr[t < ci(0)| = a, Pr[t — ci($) < 

cj(fl) | rt, • • • l J r|f — Ci(d) — - c,(i) < c, + i(fl)) = a. It is shown that undei reason¬ 

able conditions this teat, and various modifications, designed for those cases where the c r (B) 
are difficult to obtain exactly havo the asymptotic property that Pr[l < <f(B)} = 
a + 0(iV _,/2 ) where N is tho size of tho sample involved or an analogous variable This 
property can ho extended to tho case whore fl is a fc-dimensional variable. 


S, Completeness, Similar Regions, and Unbiased Estimation. (Preliminary 
Report,) Ericii L. Liciimann and Henrt Scheffe, University of California 
at Los Angelos. 

A family 3)1 of measures ill on a spaoo X of points x is defined to be complete 

if [ f(x) dM « Ofor every AT in 211 implies /(x) = 0 except on a set A for which M (A) = Ofor 
Jx 

every M in 9 11, For ft given, family of measures the question of completeness may be re- 
gardod aa tho question of unicity of a rolatad functional transform, Classical unicity re¬ 
sults are applicable to many families of probability distubutions that have been studied by 
statisticians. Tho notion of completeness throws light on the problem of similar legions 
and tho problem of unbiased estimation, Tho concept of a maximal sufficient statistic— 
roughly, a sufficient statistic that is a function of all other sufficient statistzos-is developed 
A constructive method of finding such is given, which seems to apply to all examples or¬ 
dinarily oousidorod in statistical theory. A relation between completeness and maximally 
is found. 


9. On a Proposed Method for Estimating Populations. Cecil C. Craig, Uni¬ 
versity of Michigan, Ann Arbor. 

It was proposed to tho author by a biologist that a method be devised for estimating the 
total population in an area which shall utilize the minimum distances between randomly 
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chosen individuals' ami their neighbors in dircrliniiN lying in eneli of the four quadrants. 
Assuming that the area is a square ami that the distribution law ovoi it is ioctangular, it 
turna out that the complete distribution of (he lengths of Hides of minimum squares which 
contain a nccnnd individual i« simpler than that of minimum distances. In both oases a 
simple estimate is found which uses most hut not all of the information in asamplo ami 
whose efficiency is comparnhie to that hawed on a complete enumeration of a Sample area, 
though such ail enumeration is not always possible. 

10. Some Results on the Asymptotic Distribution of Maximum- and Quasi- 
Maximum-likeiihood Estimates. Herman Redin', Institute for Advanced 
Study. 

The author investigates the asymptotic normality of maximum- ami quaai-maximum- 
likelihood eslimatcs of parameters of systems of linear stochastic, difference equations 
The principal tool is the extension of the Cent ral Limit Theorem to dependent variables pre¬ 
viously obtained by the author (presented to tho American Mathematical Society in April, 
1948). The results obtained arc analogous to those in tho case in which no differences are 
present. Home extensions are also made to systems of stochastic difference equations linear 
in the coefficients hut not necessarily in the variables. If the complete system of stochastic 
difference equations is linear in tho jointly dependent variables, asymptotic efficiency is 
demonstrated for maximum-likelihood estimates. 


11. The Probability Points of the Distribution of the Median in Random Samples 
from Any Continuous Population. Crumnnw, Kireniiart, Lor, a S. Demino, 
and Celia S. Martin, National Bureau of Standards, Washington. 

The abscissa of the (ono-tnil) (-probability point of tho distribution of the median in 
random samples of size n «» 2m 1 (m :> 0) from any continuous population is identical 
with the abscissa of the corresponding / J ,,„-probability point of the parent distribution, 
where i J ,,» is determined Liy 

(l) 0 £«£!)■ 


Fiom (1) it follows that 

(2) - 1 - P,. n 

and that 


(3) 


P,,„ = »,(n+ 1, n+ 1) 


1 

1 + I'\(n + 1, n-Pl) 


1 


1 -f- o 


fz,fiT+i7»+T) i 


where ®,(ri , v%), F(p i , n) , and Z ,(ip , vi) denote tho e-probability points of the incomplete- 
bota-funotion distribution, Snodooor's /^-distribution and Fisher’s z-distrilmtion, for 
io( = 2 q) and r s (=» 2 p) ‘degrees of freedom’, respectively. The foregoing results aro cer¬ 
tainly not "now”; Harry 8. Pollard implicitly utilized the first equality on tho extreme left 
of (3) in his doctoral dissertation at tho University of Wisconsin in 1933 (see Annals of 
Math Slat., Vol, 5 (1934), p. 250), and John II, Curtiss lias given tho generalization of (1) 
appropriate to the case of the 'rlh, position’ in random samples from any continuous popu¬ 
lation (soo Amr, Math, Monthly, Yol, 50 (1943), p. 103) and utilized (3) explicitly to obtain 
the 6% point of the distribution of tho median in random samples of size n «= 23. The aim 
of the present paper is to givo these results somewhat greater publioity~thoy aro hardly 
"well known”. To this end a table (Tablo 1) is given of the values of P„„ to 5 significant 
figures for « = 0 001, 0 005, 0 01,0,025, 0.05, 0.10, 0.20, 0 25 ami n => 3(2)15(10)96, together 
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uilh rxpieesions from which P t ,„ can be evaluated accurately and conveniently foi values 
of n (and t) not included in the table. Numerical examples illustrate the use of the table 
and formulas. Concise derivations of tbc fundamental relations and formulas are given 
m an appendix. 


12, On the Arithmetic Mean and the Median in Small Samples from the Normal 
and Certain Non-Normal Populations. Churchill Eisekhart, Lola. S. 
Dkmino, and Celia S. Martin, National Bureau of Standards, Washington. 

I,el P,.„ and .F,. b denote the abscissae of the one-tail e-probability points of the arith¬ 
metic mean and the median, more specifically, the abscissae exceeded with probability e 
by the mean and (he median, respectively, in random samples of size n (- 2m + 1) from 
any specified population, and let o-*„ and <r»„ denote the standard deviations of the mean and 
(be median in such samples, respectively. The following symmetrical populations with 
zero location parameters and unit scale parameters aro considered m this paper 

Tim 

normal (Gaussian) ’ 

double-exponential (Laplace) x ', 
rectangular (uniform) L 

1 1 

L'auehy ^l+a 2 ’ 


— M < X £ « 

— CO <, X £ « 

— « £ X £ » 


- scoh X, 


— CO £ X £ CO 


Hech s (derivative of “logistic”) l sech 5 x, m £ x £ & 

(ilia slblLm,-Tho probability 

the aforementioned oombiutaoui of j d 0 P ^ aoouracy o{ m modia n as an 
gives prooiso numonoal meaning to the ^ lfja of any odd size (n = 2 m + 1). 

esumator of the center a norm M ‘ ^ ^ ^ ca8e (normal populat.on), to 4 
Values of the ratio 2L,n =■ *>">/*■.» are g x together with the best available values 

decimal places for the above combinations of S l, „ exceeds the ratio «>,„ , 

of for» - 3(2)15(10)65. WhenO < <* ™’'the ^dTan a e 'longer 1 than the tads of 

allowing that the 'tails’ of the exact ZatL, and, when 0.05 ^ ^ 

tlio normal distribution With the same mean a argument showfl that the point 

0.26, tlio ratio H,.n is loss than ■ '* ' lhud for computing as,, based on 

of equality is close to the 0 . 012 -probabihty ponah) A method ^ b ^ 3 

the foregoing, is given that is belioved * u yalue3 J it ii alc given to 4 decimal 
In the case of the double-exp ' 0 10,0 25, for comparison with the cor- 

plMMforn- 3(3)11,and. = 0.005,0.01 < ^ ^ $ = 0 005l 0.001, 

responding values of x. * . IS rf °!i n sam nles of 3 from a double-exponential distribution 

“* " D “' onh ' 
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at 0.05,0.98, and 0.00 levels of confident. When n •» 5, tin; mean is'better’ at the 98 and 
.SO levels of confidence; and, when rt «=* 7, at the (1 SHI level. For all other combinations of 
(and n (> 3), the median is ‘better.’ 

In the case of the rectangular distribution, valuer of j\.„ are tabulated to 4 decimals for 
w m 3(2)9, anil values of x,.„ , the (-probability point of the mid-range in samples of n, 
fora * 3{2)15(10j95, in each instance for e *» 0.005,0 01,0.025,0.05,0.10,0.25, and in the case 
of 2„„ for ( ™ 0 001 also. The superiority of the midrange over the mean and the median, 
well-known but here exhibited numerically for the first time, in truly nmaaing. 

It is planned to provide values of £,,* for samples from the seeh and such 1 distributions in 
the final paper. 

13. The Relative Frequencies with which Certain Estimators of the Standard 
Deviation of a Normal Population Tend to Underestimate its Value. 

Churchill Risen hart and Celia R. Martin, National Bureau of Standards, 
Washington. 

Lot Xi | Xs , ■ ■ , x„ denote a random sample of n independent observations from a normal 
population with mean a and standard deviation a. Common estimators of a aro 

«i ™ /|/s (x, - f) 3 /n, *3 ■* *iVn/(rt — 1), sj “ »i/ra, 

7Hi = A/~ S |x, — J* 1,'», hi, *■> iiiiV'n/ln — 1), 

\f 2 *" 1 

n 

and R i - (j,, - x^/tla, where 1 » 2 xjn, x,, is the largest and the smallest of the 

*'s,cj *» E(.vi), anddi «=> E(x/, — x s ), the symbol E( ) denoting “ mathematical expectation 
(or moan value) of,” A tablo is given that shows to 3 decimals the relative frequencies 
(probabilities) with which these estimators tend to underestimate <r when n » 2(1)10,12, 
16, 20 , 2-1, 30 , 40, 00. The, results show among other things that, for very small samplos 
(« S 10) such as chemists and physicists commonly use, Hossol’s formula for the probablo 
error, which is based on s 3 , has a marked downward bias in the. probability sense (in addi¬ 
tion to its known slight downward bins in the mean value sense), whereas Peter’s formula, 
which is based on mt , has only a slight downward bias in the probability seriso and no bias 
in the moan value sense,. A tablo of divisors is given by moans of which '‘median ostium- 

n « 

tors” of <r can be computed readily from tho basic quantities 2 (x, — £), 2 | x, — 2 |, and 

i—l t m l 

( xi , — xs), that is, estimators that will over- and undoiestimate tr equally often in repeated 
use. An application to control charts is notod. Median estimators, like maximum likeli¬ 
hood estimators (“modal estimators”) have the useful property that if 2’j is a median esti¬ 
mator of e, tlion/(J')) is a median estimator of f(0), a property unfortunately not possessed 
by tho customary “unbiased” ("moan”) estimators. 

14. Some Non-Parametric Tests of Whether the Largest Observations of a Set 
are too Large. (Preliminary Report.) John E. Walsii, Douglas Aircraft 
Company, Santa Monica, California. 

Lot ss(l), • • • , x(n) represent the values oE n observations arranged in increasing order of 
magnitude. By hypothesis those observations have the properties ■ (1) They are independ¬ 
ent and from continuous symmetrical populations (2) For largo n the variances of the tail 
order statistics are either very large or very small compared with tho variances of the cen¬ 
tral order statistics (3) For large n the tail order statistics are approximately independent 



A liKTIUCTri OP PAPERS 


601 


of the central older sUlialics (.1) Each observation is from a population whoso median is 
mllier 8 or<r, where x{v — » + 1), ■ • ■ , x(n) arc from populations with median 0 while the 
central and smaller order statistics arc from populations with median y The test is- 
Accept v < 8 if mm [*(n - n) + x{j k ) , 1 <k<s^r]> 2x(t a ), where u < t„ +1 , u < i, +1 
t, - r - 1, andf„is defined by Pr [x(l a ) < v | o = <p\ = a, Here . 

« - Pr [min [z(n - n) + x(j k )- 1 < k < s < r] > %p | e = v ). 

hor largo n the sip ■ icailcc level of the test is approximately a while the significance level 
does not exceed 2 a for any value of n SuiLablo values of a can be obtained for r > 4, As 
0 ifi * io the power function tends to zero, while the power function tends to unity as 
0 — p -* «= . For 0 — ip < 0 the power function is monotomcally increasing 


15. On the Bounded. Significance Level Properties of the Equal-tail Sign Test 
for the Mean. John E. Walsh, Douglas Aircraft Company, Santa Monica, 
California, (Presented by Title). 


•(•>*-?) 


The equal-tail sign test, for deciding whether the population mean p is equal to a given 

hypothetical value mo is defined by Accept p ^ pa if eitherx t < poor > mo 

Horn i, , (j «= 1, ■ ■ , «), is the j 11 * largest of n independent observations drawn from n 
populations which satisfy tho conditions. (i) The mean of each population has the value p. 
(,n) Kttch population is continuous at its mean (iii) The mean ib at a 50% point for each 
population. This paper investigates how the significance level of the equal-tail sign test 
varies when (l)-(iii) aro not satisfied. It is found that the significance level does not differ 
noticeably from its hypothetical value under conditions much more general than (i)-(m). 
This significance level stability, combined with the properties of being easily applied and 
reasonably olficiont for small samples from a normal population, suggests that the equal- 
lad sign tcBt bo considered for application whenovor the population mean is to be tested on 
tho basis of a small numbor of observations 


16. Infinitely Divisible Distributions. William Feller, Cornell University, 
Ithaca, New York. 

A simple derivation of P. Ldvy's formula is given starting from the following definition: 
a distribution function F(x) is infinitely divisible if foi every n it is possible to find finitely 
many distributions F k , »(a;j suoh that F(x) = Fi, n (x) * ■ ■ * F and that ftn(r) tends 
to tho unitary distribution uniformly inn. This definition is more general than the one 
used by P. Ldvy and Kliintchine. The equivalence of the two definitions was proved by 
Khintchine by docp methods Tho new approach renders the equivalence obvious Fur¬ 
thermore, a now characterization of infinitely divisible distributions is given; it is equiva¬ 
lent to Gnodenko’s characterization but requites no special analytical tools. 


17. Fluctuation. Theory of Recurrent Events. William Feller, Cornell Uni¬ 
versity, Ithaca, New York, 

Consider a soquonco of indepcndjnt or dependent trials but suppose that each has a dis¬ 
crete sample space. The paper stffchos recurrent patterns & which can be roughly charac¬ 
terized by tho property that after every ooeuirenee of § the process starts from scratch, 
tho conditional probabilities coinciding with the original absolute probabilities lypical 
examples are success runs, returns to equilibrium, zeros of sums of independent variables 
passages through a state in a Markov chain. New methods are developed unliving and 
simplifying previous theories and applying to larger classes of rerurrenl events ft is show n 
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in an elementary way the probability that 5occurs at the n-lh trial either has a limit or 13 
asymptotically periodic. This theorem has many consequences For example, the ergodic 
properties (if discrete Markov chains follow in a few lines, and the difference between finite 
and infinite chains disappears. Several theorems of the renewal type are proved Weak 
and strong limit theorems for tho number ,V„ of occurrences of § in n trials are derived 
shedding new light on stable distributions. 

18. Formulas for the Percentage Points of the Distributions of the Arithmetic 
Mean in Random Samples from Certain Symmetrical Universes. Uttam 
Chanb, University of North Carolina and National Bureau of Standards. 

Using the method of Fisher and Cornish, tho 100s% point of the distribution of tho arith¬ 
metic moan in random samples of size N from any universe having finite cumulants of the 
first four orders, si ,m,K t , x, , is expressed to order l/iV J as a function of iV, the 100 e% point 
of a standardized normal deviate and the quantities «i , * 5 , ki/k 5 ,/j , kJk\ The numerical 
coefficients are evaluated for the cases of sampling from rectangular, double-exponential, 
secli and secli 3 rlistiibuttons. The application of the resulting formulas is illustrated nu¬ 
merically for e «■ .001, .005, ,010, .025, .050, .100, and .250. In the case of the rectangular 
and double-exponential distributions, the results obtained for N => 10 are compared with 
accurate values, indicating tho accuracy of tho formulas. 



NEWS AND NOTICES 

Readers are moiled In submit to the Secretary of the Institute news items of interest 

Personal Items 

Professor T. A. Bickerstaff has been appointed Chairman of the Department 
of Mathematics at the University of Mississippi. 

Professor Raj Chandra Bose has resigned as head of the graduate Department 
of Statistics of the University of Calcutta, and has been appointed Professor of 
Mathematical Statistics at the University of North Carolina beginning in the 
winter of 1949. Professor Bose is an authority on the design of experiments 
and is writing a book on the combinatorial mathematics of the subject. He has 
also published extensive contributions to differential geometry and to multi¬ 
variate statistical analysis, and has been instrumental in developing practical 
sam ple surveys. He served as Visiting Professor m the Institute of Statistics 
at North Carolina in the winter and spring of 1948 
Mr. Hamilton Brooks’s paper, “The Probable Breakdown Voltage of Paper 
Dielectric Capacitors,” was one of the four papers selected for a national award 
by the American Institute of Electrical Engineers. His paper presents the sta¬ 
tistical treatment of an engineering problem and shows by experiment how 
insulation strength distribution is determined by the distribution of the extreme 
size of flaws. 

Dr. C. West Churchman, formerly a member of the staff at the University of 
Pennsylvania, was appointed Associate Professor of Philosophy at Wayne Uni¬ 
versity, Detroit 1, Michigan, starting February 1, 1948. 

Dr. William G. Cochran has accepted an appointment as Professor of Bio¬ 
statistics in the School of Hygiene and Public Health of the Johns Hopkins Uni¬ 
versity and will assume this post in September. Dr. Cochran, a native of 
Glasgow, Scotland, comes to Johns Hopkins from the University of North Caro¬ 
lina where he served as Associate Director of the Institute of Statistics from 1946 
until the present. 

Dr Louis M. Court has been promoted to an assistant professorship in the 

Mathematics Department of Rutgers University. 

Dr. Donald A. Darling, formerly a member of the staff at Cornell University, 

has accepted an assistant professorship at Rutgers University. 

Mr. Aryeh Dvoretzky has been appointed a member of the Institute for Ad¬ 
vanced Study, Princeton, New Jersey, for the 1948-1949 academic year. 

Mr. Arnold King, formerly Director of Research in Statistical Methodology 
for the Bureau of Agricultural Economics at Iowa State College, was appointed 
Managing Director of National Analysts, Inc., Philadelphia on July 1,, i948. 

Mr Charles L. Marks has resigned his position as instructor of mathematics 
at the University of North Carolina to accept a teaching appointment m the 
Department of Statistics, The George Washington University, Washington 6, 

D. C. 
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Miss Doris Now man has accepted an appointmcnl at the S. Naval Medical 
Research Laboratory, U. S. Naval Submarine Base, Now London, Conn. 

f)r. Ernest Rubin lias been transferred from the Immigration and Naturaliza¬ 
tion Service, General Research Section, Washington, D. C. to the European 
Branch, Areas Division, Office of International Trade in (he Department of 
Commerce as an Economic Statistician. 

Mr. David Rubinstein has been promoted from Junior Research Assistant in 
tho Statistical Laboratory, University of California, Berkeley, to a Teaching 
Assistant. 

Miss Elizabeth L. Scott, formerly an Associate and Research Assistant in the 
Statistical Laboratory, University of California, Berkeley, has been promoted to 
Lecturer and Research Assistant. 

Dr. Gobind R. Seth, who was formerly a student at Columbia University, has 
accepted an associate professorship in statistics at the Statistical Laboratory, 
Iowa State College. 

Dr. Charles M. Stein lias been promoted to an assistant professorship in the 
Statistical Laboratory, University of California, Berkeley. 

Professor Gerhard Tiutnor is on leave of absence for one year from the Iowa 
State College to join the Department of Applied Economics at Cambridge Uni¬ 
versity, Cambridge, England as a Research Associate, 

Mr. L. II. C. Tippett, Chief Statistician of tho British Cotton Industry Re¬ 
search Association, delivered twelve one-hour lectures on Statistical Quality Con¬ 
trol and Industrial Experimentation at a conference at the Massachusetts Institute 
of Technology, May 5-14, before a large audience. Dr, W. A. Showhart of the 
Boll Telephone Laboratories addressed a large audience on the Future of Statistics 
in Industrial Research and Quality Control on May 14 at the same conference 


Scientists and Reserve Officers 

The Department of the Army has established a program of particular interest 
to statisticians and other scientists who hold Reserve commissions in the Army, 
and who are professionally engaged in teaching or research and development. 

The objectives of the program are to: 

(1) maintain the useful affiliation of statisticians and other scientists with the 
Organized Reserve Corps, 

(2) provide peacetime Reserve assignments for these officers, enabling op¬ 
timum utilization of their education, oxporionco and skills, 

(3) furnish mobilization assignments which will fully utilize their talents, and 

(4) adequately prepare these officers for mobilization. 

The Technical Services of the Department of the Army submit to these Re¬ 
search and Development Reserve Groups reaoarch problems and projects which 
pose an intellectual challenge to members of the group. Thus, the program 
provides members of each group a type of training which is in keeping with their 
scientific and technical interests and competence, rather than a traditional 
kind of training session in which scientists have little or no interest. 
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The piogram is now being implemented only in those areas where there is a 
definite local interest To date, eighteen Research and Development Reserve 
groups have been organized, Twelve additional groups arc m process of organi¬ 
zation, Others are m the initial stages of formation Several of these groups 
have been formed in communities in which large universities, industrial research 
laboratories, or private research foundations are located Typical localities are 
Chicago, Illinois, Wilmington, Delaware, Newark, New Jersey; Houston, Texas; 
Washington, D 0.; Manhattan and Lawrence, Kansas, Champaign-TJrbana, 
Illinois; Pittsburgh, Pennsylvania, Denver, Colorado; and Detroit, Michigan 
Provision is made to submit research projects of interest to all categories of 
scientists—chemists, physicists, engineers, geologists, geographers, psychologists, 
mathematicians, statisticians and all of the biological scientists. 

Reserve officers who are currently engaged in civilian research, college or 
university teaching, or industrial research or development, or who in the past 
have had specific research experience are eligible to make application for assign¬ 
ment to an Organized Reserve Research and Development Group. A group 
may bo organized in any locality where there are twenty (20) or more qualified 
officer scientists who desire to participate m the program A subgroup may be 
organized with ten (10) qualified members 

The program is under the general direction of the Research and Development 
Group, Logistics Division, General Staff, United States Army. The entire 
program is outlined in Department of the Army Circular Number 127, dated 5 

May 1948. , , . 

Inquiry about organization of an Organized Reserve Research and Develop¬ 
ment Group or about assignment to a group already organized should be made 
of the Unit Instructor, ORC, or of the Senior Army Instructor, ORC, m the 
locality in which the officer resides In localities m which a group has already 
been organized, the Commanding Officer of the group will consider applications 
for assignment of additional officers 


New Members 

The, following persons have been elected to membership vn the Institute 
(Juno 1 to August 16, 1948) 

a j„ tnn Hlalmar Tr. (Umv. of Oregon Medical School) Student, Turner, Oregon. 
Bwerjee .Kali Shailkar, M.A. (Calcutta Umv ) Statistician, Central Sugar Cane Uescarc 

B-rLrr Physioist with Naval 

Cowan?^!^ "'tasUffiv)' TLich Ai^b War Department, 80 Lewis Street, 
Hast Lynn, Massachusetts Research Associate, Educational Testing 

——»• T “‘- 

of Buffalo, 16S Winspear Avenue, Buffalo IB, New York, 
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Hofmann, John E., A M, d’niv nf Mmiii*«ti(ii) Senior Research Fellow, 89£i Oakland 
Sired, Amen, JW« 

Kimball, Allyn W., Jr., B.S. il mv. nf llulTnln) Research Siatiuticiau, Department of 
Biometrics, School of Aviation Medicine, Hiuidnlpli Field, Texan 

King,EdgarP., Jr.,BS, (Carnegie InstiiuteofTeehnnlogy) TeachingAnaiatantinMallw- 
inaticH, Department of MatheinntioH, Carnegie Institute of Technology, Pittsburgh 
13, Pennsylvania 

Link, Curtis K., II,H. (Univ of Oiegont Graduate StinlciiCAsHistiuit, 760 IF, 6th Sheet, 
Eugene, Oregon 

Loider, Nathan, B.A, (College of the City of X, \ .1 Malhenmlieian P-2, ISjl Summit 

I'lm, NAY., IFasAinptcm 9,1), (', 

Manos, Nicholas E., M,A (ITniv. nf Calif.) Meteorologist and Statistician, l^J f Rhode 
Inland Acennc, AMI ., Washington 5, l), (’. 

Peters, Stefan, Ph.D, (Krlanjen, Germany I Lecturer at the Tnivcrsity of California, mi 
Peralta Amine, Berkeley 0, California. 

Petrou, Nicholas V., M.Sc, (Harvard U/tiv ) Electrical Kngitiecr, Project Engineer, West- 
inghouso Klectrie Corporation, 18U Animate Uhl., Pittsburgh 91 , Pennsylvania, 

Prakash, Aditya, i\I,A, (Univ. of Michigan) Student, c/o Mathematics Department, Uni¬ 
versity of Michigan, Ann Arlior, Michigan. 

Read, Robert R„ IDS. (Oregon Stale College) Apprentice Engineer, Inventory and Costs 
Division, Pacific Telephone and Telegraph Company, 9007 N.K., 30, Portland , (hegon 

Selden, Esther, M,A. (Vilrm, Poland) Research Assistant, Statistical Laboratory, Umvci- 
sity of California, 2Ufl lkiby Street, Berkeley 5, California, 

Sodano, John J„ B S, (Queens College) Student, Mathematical Statistics, Columbia Uni¬ 
versity, 11945 03rd Amuc , Jamaica3, New York, 

Stilllnger, Richard C., M.S., (Univ. of Michigan) Graduate Student, 1308 Wctlon Cowl, 
Willow Hun, Michigan, 

Swan, Albert W,, B.A.Sc, (Univ. of Toronto) Statistical Section Research and Develop¬ 
ment Department, The United Steel Company Limited, c/o The United Steel Com¬ 
panies Ltd , 17 Westbourne Road, Sheffield 10, England. 

Tate, Robert F., A.B (Univ of Calif) Teaching Fellow, Department of Mathematical 
Statistics, Phillips Ball, Chapel Hill, North Carolina, 

Telchroew, Dan, II,A. (Univ. of Toronto) Division of Research, Department of Lands and 
Forest, South Baymoulh, Ontario, Canada, 

Tyler, Leona E„ Ph.I) (Univ, of Minnesota) Associate Professor of Psychology, Depail- 
mentof Psychology, University of Oregon, Eugene, Oregon, 



ADOPTION OF THE NEW CONSTITUTION 

The chief order of business at the business meeting of the Institute held at 
Madison, Wisconsin, on September 10,1948, was the adoption of the new Consti¬ 
tution. The draft mailed to the members in August, 1948, was adopted unani¬ 
mously after two changes had been made. They were: (1) the insertion of the 
word "Article” before each of the respective articles and (2) the elimination of 
the first “the” in the third line and fourth paragraph of Article 4. 

Other business transacted at the meeting included a report of the Secretary- 
Treasurer on the financial condition of the Institute indicating that while the 
Institute is just operating within its income during 1948, steps will have to be 
taken to provide the additional revenue needed for 1949. It was decided not to 
raise dues for 1949 but to attempt to raise additional funds by: (1) an immediate 
appeal to universities and other institutions which are sponsoring research in 
mathematical statistics for contributions to the Institute and (2) an appeal to 
the members of the Institute to make additional contributions at the time of the 
payment of their annual dues. 

Other matters under consideration at the meeting included a reading and dis¬ 
cussion of a proposed revision of the By Laws, the announcement of the dates 
and locations of future meetings of the Institute and the passing of a resolution 
of thanks to those contributing to the success of the Madison meeting. 

A copy of the official minutes of this meeting may be obtained on request from 
the Secretary-Treasurer. 

P. S. Dwtek 
Secretary- Treasurer 
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The Eleventh Summer Meeting of the Institute of Mathematical Statistics 
was held at the University of Wisconsin, Madison, Wisconsin, Tuesday, Sep¬ 
tember 7 through Friday, September 10, 1948. The meeting was held in con¬ 
junction with the summer meetings of the American Mathematical Society, the 
Mathematical Association of America and the Econometric Society. The follow¬ 
ing eighty members of the Institute attended the meeting: 

C. B. Allendocrfor, V. L, Anderson, K. J. Arnold, II, M. Bacon, A. S Barr, Walter liartky’ 
H. P. Beard, A. A. Bennett, T. A. Biekerslalf, J. H, Bimhny, Maria Oastellani, Uttam Chftnd’ 
Herman ChcrnofT, 0, C Craig, J. II. Curtiss, G.B Danizig, D. B. IJe Lury, J.I, Doob.A M 
Dutton, P. S. Dwyer, Mrs. Daisy Edwards,Churchill EisenharL, II. P. Evans, 0. II. Pisehei, 
J. E. Freund, II. M. Cielunan.II, II. Gormond, M. A, Girsluck, Casper GotTman, P. It Ilalmos, 
W. G. Hart, E. II. C. llildehramlt, Wassily Uaeffding, D. li llorvilz, Harold Hotelling, A, 8, 
Householder, M. II. Ingraham, Leo Katz, Oscar Kemp I home, J. F Kenney, W. M. Kincaid, 
T, G. Koopmans, H. D. Larsen, IValtnr Leighton, II. U. Mann, A. M. Mark, Jacob Marsehak, 
A. W. Marshall, Kenneth May. M. It. Mickey, Jr., Dorothy J. Morrow, 0 J. Nesbitt, M, J 
Xctr.org, John von Neumann, Jerzy Noyman, G. B. Price, C, J. Iters, J. S. Ithodos, P It. 
Itidor, F, D. ltigby, Herman Ilubin, Arthur Sard, Henry ScheiTd, E D. Scholl, I. E Segal, 
G, It. Seth, W. B, Simpson, Andrew Hobrzyk, IC W. Stacy, C. M. Stein, A, G. Swanson, 
Zcmm Szatrowaki, It. M. Thrall, A. W Tucker, J. W. Tukey, W. A. Wallifl, J. E. Walsh, 

J. K, Wilkins, Jr., H. S. Wilks, M. A. Woodbury. 

The Tuesday morning session was devoted to contributed papers. Professor 

K. J. Arnold of the University of Wisconsin presided. The attendance was 
approximately forty. The following papers were presented: 

1. On Disiribidion-frec Confidence Intervals, Preliminary Itrporl, 

Dr Wassily IlooIIding, Institute of Statistics, University of North Carolina. 

2. On Certain Statistics Jar Samples of 8 from a Normal Population. 

Mr Julius Liebloin, Statistical Engineering Laboratory, X'utional Bureau of Stand¬ 
ards Presented by Di. Churchill Eiscnhart. 

3. On Multinomial Distributions with Limited Freedom: rt Stochastic Genesis of Pardo’s 
and Pearson’s Curves, 

Professor Maria Caslellani, Umvoisity of Kansas City, 

4 Fitting Generalized Truncated Noimal Distributions 

Professor Harold Hotelling, Institute of Statistics, University of North Carolina, 

5 On the Distribution of the Two Closest Observations Among a Set of Three Independent 
Observations. 

Professor G. It. Seth, Statistical Laboratory, Iowa State College, 

G. The Derivation of Certain Recurrence Formulae and their Application to the Extension 
of Existing Published Incomplete Bela Function Tables. 

Dr. T. A. Bancroft, Alabama Polytechnic Institute. (Presented by title ) 

On Tuesday afternoon a session for contributed papers was held jointly with 
the American Mathematical Society. Professor P. S. Dwyer of the University 
of Michigan presided The attendance was approximately eighty. The follow¬ 
ing papers were presented: 


608 



REPORT ON MADISON MEETING 


609 


7 Asymptotic Student izalion m Testing Hypothesis 

IJr Herman CheinofT, Cowles Commission, Univcisity of Chicago. 

S Completeness, Sumlai Hcgions and Unbiased Estimation Preliminary Repoit 
Professor E. L Lehman, University of California and Piofessor Henry ShefR, Uni¬ 
versity of California ah Los Angeles 
!) On a Proposed Method for Estimating Populations 
Professor C. C Ciaig, Umveisity oi Michigan 
1(1 .Some Results on, the Asymptotic Dish ibution of Maximum- and Quasi-maximum-hkeh- 
hood Estimates 

l)r Herman Rubin, Institute for Advanced Study. 

11 The Probability Points of the Distubution of the Median in Random Samples from any 
Continuous Population. 

Dr. Churchill ICisenhai t, Lola S Deming and Celia S Martin, Statistical Engineering 
Tjiboratory, National Bureau of Standards 

12 On the Arithmetic Mean and the Median m Small Samples from the Normal and Certain 
Non-normal Populations, 

Di Churchill Risen hart, Lola S Deming and Celia S Martin, Statistical Engineering 
laboratory. National Bureau of Standards. 

lit The Relatwe Frequencies with which Certain Estimatoi s of the Standard Deviation of a 
Normal Population Tend to Underestimate Its Value 

Dr Churchill Eisonlmit and Celia S MaTtin, Statistical Engineering Laboiatory, 
National Bureau of Standards 

11 Some Non-parameli ic Tests of Whethei the Largest Observations of a Set are too Large 
Preliminary Report 

j), J.E Walsh, Project Iland, Santa Monica, California 
1 /S, On Sonic Bounded Significance Level Properties of the Equaltail Sign Test for the 


(Presented by title,) 


Mean. 

Dr. J K. Walsh, Project Rand, Santa Monica, California. 

10 Infinitely Divisible Distributions. 

Professor Will Fellor, Cornell University (Presented by title.) 

17, Fluctuation Theory of Reamenl Events 

Piofessor Will Feller, Cornell University (Presented by title ) 

18. Formulae for the Percentage Points of the Distributions of the Arithmetic Mean m 
Random Samples from Certain Symmetrical Umveises 

Mr. llttam Chand, University or North Caiolina and National Buieau of Slandards. 
(Presented by tiLlo.) 


Abstracts of the contributed papers appear elsewhere m this issue of the Annals^ 
On Wednesday morning the Institute and the Econometric Society held a joint 
session on Stochastic Processes with Professor Harold Hotelling of the University 
of North Carolina presiding. Attendance was approximately ninety. Professor 
Hotelling presented an Historical Summary of the Problem Professor J_L Doo 
of the University of Illinois presented a paper, Stochastic Differences qua to 
and Stochastic Differential Equations. Professor 

of the University of Chicago presented a paper, Browman Motion, Dynamical 

Friction and Stellar Dynamics. n + Thurs- 

The throe joint session, of the Institute »n t l the E “" f “'? ty m 

Jly we devoted .0 a « 

Wdtd“ rS d Professor 8 8 Wilks of Prineoton Universtty. 
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Professor Jok voi\ Neumann of the Institute for Advanced Study presented a 
paper, Sumy of Ik Theory of Games, Professor Oskar Morgenstem of Princeton 
University presented it paper, Economics and Ik Theory of Games. I)r, M, A, 
Girabjck of Project Rand presented a paper, Stahslm and Ik Theory of Ganm. 
The second morning session was under the chairmanship of Professor John von 
Neumann of the Institute for Advanced Study. Dr. 13. W, Paxson of Project 
Rand presented a paper, Remit Developments, Professor J. W, Tukcy of Prince¬ 
ton University presented a paper, A Problem in Strategy, Dr, G, B. Dantzig of 
the Army Air Forces presented a paper, Programming in a Linear Structure, The 
final session of the symposium was a round table discussion with Professor John 
von Neumann of the Institute for Advanced Study as chairman and with the 
following participants; Dr. G. B. Dantiig, Dr. M. A, Girshick, Professor Harold 
Hotelling, Professor Irving Kaplansky, Professor Samuel Karlin, Dr, J. C, C. 
McKinsey, Professor Oskar Morgenstern, Dr. K, W, Paxson, I)r. L. S. Shapley, 
and Professor J. W. Tukcy. 

A membership business meeting was held on Friday, September 10, in Bascom 
Hall at which twenty-one members were present. An account of the business 
transacted at this meeting may be found elsewhere in this issue under the heading 
"Adoption of a New Constitution.” 

The final session was on Sequential Estimation and was held jointly with the 
Econometric Society on Friday morning with Professor Jerzy Neymen of the 
University of California presiding. Attendance was approximately fifty. Pro¬ 
fessor Charles Stein of the University of California presented a paper on Sequen¬ 
tial Estimation, Professor AY. A, Wallis of the University of Chicago presented 
a discussion. 

Social affairs during the meeting included a tea Tuesday afternoon, a concert 
of the Pro Arte String Quartet Tuesday evening, a dinner Wednesday evening, 
a picnic Thursday afternoon, and a beer party Thursday evening. 

K, J. Arnold 
Assistant Secretary 








